Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support #36

Open
wants to merge 1,017 commits into
base: master
Choose a base branch
from

Conversation

rsanagap
Copy link

@rsanagap rsanagap commented Jul 5, 2020

Field,

Please review and merge them.

Thanks

varajago and others added 30 commits May 17, 2023 09:31
Total memory requirement for large matrices exceed the range
of 32-bit integer. Changed the type of argument to local
memory allocation routine to size_t to accomodate large sizes.

Added stdc++ flag to LDFLAGS for including C++ based
aocl-utils library in the build.

Signed off by Vasanthakumar R <varajago@amd.com>

Change-Id: I1e521c373acc89cc6a32e17f55ae1f30707ba196
Updated README to use name AOCL-LAPACK for AMD optimized
version of libflame

Change-Id: I7d4b88a18dafc60dfb39e0354bf06bbc0cdd9082
Change-Id: Ia90f01c4f53e88ff94c107717a4d4127e76a29a1
Updated README to use name AOCL-LAPACK for AMD optimized
version of libflame

Change-Id: I7d4b88a18dafc60dfb39e0354bf06bbc0cdd9082
When running on Linux, FLA_Apply_pivots_unb_external.c was using an
implicitly defined function for memory allocation. Correct this code
to use malloc.

AMD-Internal: [CPUPL-3412]
Change-Id: I8d17aaad691df243633566f361f16f27389d1ed8
Fixes compiler warnings generated due to recent checkin-ins

Signed-off-by: Jintu Das <Jintu.Das@amd.com>
Change-Id: Iad565a6816e5a5b5bdf94b81944cffb70563283c
Incorrect macro used. replaced AOCL_DTL_TRACE. Replaced AOCL_DTL_TRACE_LOG_EXIT with AOCL_DTL_TRACE_EXIT_INDENT

Signed-off-by: bsinghpa<brijesh.singhpanwar@amd.com>
AMD-Internal: CPUPL-3439
Change-Id: Ibc121d5a741ff2ef606220047417b032203819ae
Incorrect macro used. replaced AOCL_DTL_TRACE. Replaced AOCL_DTL_TRACE_LOG_EXIT with AOCL_DTL_TRACE_EXIT_INDENT

Signed-off-by: bsinghpa<brijesh.singhpanwar@amd.com>
AMD-Internal: CPUPL-3439
Change-Id: Ibc121d5a741ff2ef606220047417b032203819ae
Change-Id: I4e13db9123208c9eb7a2fe533ee2e5c9421fd2f8
AMD-Internal: CPUPL-3441
Change-Id: I08d9faa1f1768e73cee3e820b2438742dee25a90
In the configure script, add clang compiler flags for debugging,
profiling and specifying the C language standard.

Other changes:
1. Update C language standard from C99 to C11 for gcc, clang and
   icc compilers. Other compilers are not relevant, so leave as is.
2. Moved all warning flags to warning flags section. Note: warnings
   are enabled by default.

AMD-Internal: [CPUPL-3408]
Change-Id: Iec6f9c40a6c7e0e8091484e3f9ee041feb62f631
AMD-Internal: CPUPL-3441
Change-Id: I08d9faa1f1768e73cee3e820b2438742dee25a90
Change-Id: Ic261233289c701dce800f2fc384253847e805c28
AMD-Internal: CPUPL-3470
Change-Id: I016df622e3ae2be349683a1b5b60d6afe8a747ad
AMD-Internal: CPUPL-3470
Change-Id: I016df622e3ae2be349683a1b5b60d6afe8a747ad
Enabled ability to pass libaoclutils repository path and
tag as a parameter to build tools of configure and CMake

Change-Id: I2b0309249d68943893be0f0444d6c85187567f04
Changed the default netlib-test version from LAPACK 3.10.0 to
LAPACK 3.11

Signed-off-by: Jintu Das <Jintu.Das@amd.com>
Change-Id: I8e7d1f0cd8721fe424e12d5a5476bafd13f5f4fc
- Pass --imatrix parameter with values 'I' for infinity and 'N'
  for NaN in command line execution

AMD-Internal: CPUPL-3170
Change-Id: I6db4d87023fa790d59065ed3a3228cd496135fbc
Added stdc++ dependency to libflame shared library.
This is to refelct stdc++ as one of the dependencies
to use libFLAME library

Signed-off-by: Vasanthakumar R <varajago@amd.com
AMD-Internal: CPUPL-2129
Change-Id: I1742876fcedb7e13729e5ce6abe73c8125d6d8fb
Added stdc++ dependency to libflame shared library.
This is to refelct stdc++ as one of the dependencies
to use libFLAME library

Signed-off-by: Vasanthakumar R <varajago@amd.com
AMD-Internal: CPUPL-2129
Change-Id: I1742876fcedb7e13729e5ce6abe73c8125d6d8fb
To match netlib LAPACK behaviour, ensure diagonal is real, even when
exiting with error for matrix being non-positive-definite or diagonal
being NaN.

AMD-Internal: [CPUPL-3530]
Change-Id: Ic82452517e13c1222ab1aacd1dc76d9c2a26d6d2
To match netlib LAPACK {c,z}lauu2.f behaviour, ensure diagonal is real
for elements 1..N-1 in libFLAME equivalent function FLA_Ttmm.

AMD-Internal: [CPUPL-3531]
Change-Id: I3472db9951d38e0b0adfa805ee514b396c19ae5b
Adding Log and trace with timming to double complex precision APIs (files starting with z)
Minor bug fix patch to remove ipiv, jpiv and jpvt

Functions: zcgesv

Signed-off-by: bsinghpa<brijesh.singhpanwar@amd.com>
AMD-Internal: CPUPL-2422
Change-Id: If3c864efa644a6390470fc948cb149e7ca3cc166
Adding Log and trace with timing to double complex precision APIs (files starting with z)

Functions:zhetri_3x,zhetri_rook,zhetrs,zhetrs2,
zhetrs_3,zhetrs_aa,zhetrs_aa_2stage,
zhetrs_rook,zhfrk,zhpcon,zhpev,zhpevd,
zhpevx,zhpgst,zhpgv,zhpgvd,zhpgvx,zhprfs,zhpsv,zhpsvx.c

Signed-off-by: bsinghpa<brijesh.singhpanwar@amd.com>
AMD-Internal: CPUPL-2422
Change-Id: I32cef9ba1884656a6be8372f53b8f886a2d5ab5e
1. Added BLAS free implementation of zgetrf for smaller matrix sizes.
(For sizes between 1 to 22)
2. Changed the path to lapack implementation of zgetrf for matrix size
between 22 to 50.

Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>

Change-Id: Icdc07e0905e3bb31a194467aefea9b4b78cc1f27
Modified DLARF, DORG2R and DORGQR API for smaller size inputs.

Signed-off-by: Jintu Das <Jintu.Das@amd.com>
AMD-Internal: CPUPL-2601
Change-Id: I468ca2785a1bf94c0eea1a0045119be6f4aeee89
Modified the non-zero verification of pivot element and
changed the logic for division by pivot element to fix the
netlib failures observed with zgetrf small path implementation.

Signed-off-by: Vasanthakumar R <varajago@amd.com
Change-Id: Ieb3150ad381453159e65b544a8c386479df2d8e6
Enabling the original small path optimized code only for Windows build.
The previous version with fix for Linux increased the total errors in
Windows.
Fix for compilation error without AMD optimizations included.
TODO: Analyze and fix the difference in Netlib test results between
Windows and Linux.

Signed-off-by: Vasanthakumar R <varajago@amd.com
Change-Id: I6ac0c420f98879d7c4a0f647f6fcb4837968fbe2
Adding Log and trace with timing to double complex precision APIs (files starting with z)

Functions:
ztpqrt2,ztprfb,ztprfs,ztptri,ztptrs,
ztpttf,ztpttr,ztrcon,ztrevc,ztrevc3,
ztrexc,ztrrfs,ztrsen,ztrsna,ztrsyl,
ztrtrs,ztrttf,ztrttp,ztzrqf,ztzrzf,zunbdb,

Signed-off-by: bsinghpa<brijesh.singhpanwar@amd.com>
AMD-Internal: CPUPL-2422
Change-Id: Ieffdff2b8733c5033266b1d21d100fa5f7896c87
Adding Log and trace with timing to double complex precision APIs (files starting with z)

Functions:
ztbtrs,ztfsm,ztftri,ztfttp,ztfttr,
ztgevc,ztgex2,ztgexc,ztgsen,ztgsja,
ztgsna,ztgsy2,ztgsyl,ztpcon,ztplqt,
ztplqt2,ztpmlqt,ztpmqrt,ztpqrt

Signed-off-by: bsinghpa<brijesh.singhpanwar@amd.com>
AMD-Internal: CPUPL-2422
Change-Id: I16c5078425c32af907d85dcba1a57b002e31dfc6
varajago and others added 30 commits October 10, 2023 03:22
Implicit function declarations giving rise to errors in
AOCC4.2 Build 140. Updated test-suite to remove those errors.

Signed-off-by: Vasanthakumar R <varajago@amd.com>
AMD-Internal: CPUPL-3969
Change-Id: I1b5c2e0f59f8f5cc4927b22c7c4ac74502cdb794
	1.Added avx-512 implementation of ZGETRF
	2.Updated avx2 implementation of ZGETRF to improve performance
	3.Additional changes done to remove declaration warnings.

Jira ticket : CPUPL-3611
Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
Change-Id: I0e604d961ca18804766ebf655daf2c6dfa616d8b
Optimization of DGESVD path 10 and 1t for small sizes.
Reused optimized bidiag code from path 6T optimization.

Signed-off-by: Vasanthakumar R <varajago@amd.com>
AMD-Internal: CPUPL-3251
Change-Id: Ib70822a5f89871a6d749abc4b1039d6e44d45f8f
HSEQR API was failing in main test suite for cases
where not all the Eigen values have converge. When
such a condition arises, the code should not check
for negative test case failires. Added a condition
to avoid negative test check when such a case
occurs.

Signed-off-by: Vasanthakumar R <varajago@amd.com>
AMD-Internal: CPUPL-4032
Change-Id: Iad948a44c0a6fdbaab00d0d1f08a8777f07fcc55
When AOCL-BLAS linking is selected during build, the BLAS
API declarations have to be taken from blis.h header of AOCL-BLAS
library instead of AOCL-LAPACK's internal header file. Made updates
to several source and header files to handle the same.

AMD-Internal: CPUPL-3826
Change-Id: I1f08d4d6ee6423f52cf33111d204fa673f9eb361
Linking error was observed with libFLAME library built without "--enable-amd-opt" option. Checks to exclude compilation of few optimized API functions were missing. Have made the necessary changes to solve the issue.

CPUPL-4119
Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
Change-Id: Ifbabc9919dc7887592fe91d738c89738e2174ccf
Optimization for Path10 in DGESVD lead to errors
in netlib tests. Modified the logic to fix the
issues. Added separate optimized code path for
1T path which was sharing the same code as 10.

Signed-off-by: Vasanthakumar R <varajago@amd.com>
AMD-Internal: CPUPL-3251
Change-Id: I3f5415bfe7d5e5dc4cc3b026560332738cc6ce19
Fixed the memory leak issue in main testsuite.

Signed-off-by: Parag <parsharm@amd.com>
AMD-Internal: CPUPL-4040
Change-Id: Ia6f06bb5f3252090ab523a4365c2aca69bbb5ef0
Added AVX2 implementation of DGETRS for NOTRANS.

Signed-off-by: Jintu Das <Jintu.Das@amd.com>
AMD-Internal: CPUPL-3660
Change-Id: I5ff38241c99d139e0f834f52fce2b55004ac4470
Usage of FLA_AMD_OPT removed and replaced with FLA_ENABLE_AMD_OPT
with '#if' convention instead of '#ifdef'.

Signed-off-by: Vasanthakumar R <varajago@amd.com>
Change-Id: Idbf4b5f385ed2167198c4a2b47340c1ab2a8778b
Legacy testsuite CMake build fails when linked to ST BLIS and ST libFLAME. The cause for failures are,
	- Few OpenMP routines are being referenced in ST libflame as part of FLA_PROGRESS feature.
	- Inclusion of pthread library is missing in legacy testsuite CMake file. BLIS library has a dependency on this libaray.

Made changes in FLA_PROGRESS header file and legacy testsuite CMake file to handle the same.

CPUPL-3899
Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
Change-Id: I66d25d509e693bfbdb48157034edfc22d2ab1eb1
Added the missing statement in validation code and test code.

Signed-off-by: Parag <parsharm@amd.com>
AMD-Internal: CPUPL-3759
Change-Id: I8bd5179d7c6f1a4e0d32faad5d2ff614df68eb4f
- removed redundant code related to PIC flags and GCOV flag
- Updated BUILD.md to generate libflame with aocc compiler linked with libiomp5.so
- std=c11 has some issues generating the main test suite.

Signed-off-by: bsinghpa<brijesh.singhpanwar@amd.com>
AMD-Internal: CPUPL-3878
Change-Id: I80d31057644bfcb7737fb47145dc27389442ccd2
Memory corruption in dgesvdx test happens in dtsebz API
due to mapping of N SVD problem to 2*N Eigen problem.
Handled it through additional buffer for Eigen problem
in DBDSVDX and copying only required singular values to
SVD array at the end.

Signed-off-by: Vasanthakumar R <varajago@amd.com>
AMD-Internal: CPUPL-4077
Change-Id: Ied04d74cbfb655ca486857754d3e0bfda2112993
     1. Set AVX2 as default compilation flag. This gives better performance for functions using high level language code.
     2. Fixed CMAKE_INSTALL_PREFIX path in Windows to customize the path

AMD Internal : [CPUPL-3395]

Change-Id: I78bb2a77f951f5f40088dd15f739fbc32e081e6c
At present, AOCL-Utils library is merged using its object files
into AOCL-LAPACK library by default. This behaviour is now changed
to merge only when explicitly set in build option using
"ENABLE_EMBED_AOCLUTILS" for both CMake and autoconfigure tools build
mode. Otherwise, by default, AOCL-Utils library is not merged into
AOCL-LAPACK and applications have with link AOCL-Utils explicitly.

With this change, it requires following to be done in default build
- For CMake based build, ensure header file path of AOCL-Utils is
  set using LIBAOCLUTILS_INCLUDE_PATH option.
- For autoconfigure makefile build, ensure header file path of
  AOCL-Utils is set in CFLAGS before running make command.
- Applications using AOCL-LAPACK have to link with AOCL-Utils
  explicitly

AMD-Internal: CPUPL-4020
Change-Id: I668814b12baacd8d793ef7de3145b6277d7297f0
Optimization of DGESVD path 6 using previously
optimized modules and specific optimizations.

Signed-off-by: Vasanthakumar R <varajago@amd.com>
AMD-Internal: CPUPL-3251
Change-Id: I187d26876553ed75e5a46be712b0d06cfb61f05a
Added new test API to verify LAPACK SYEVX API functionality

AMD-Internal: CPUPL-3991
Signed-off-by: dnikku <Deepika.Nikku@amd.com>
Change-Id: Id01bb7084eb172a4cb40058c0d0a6b41095cd049
Exclude function that gets thread count from initialization function,
aocl_fla_init. This will avoid overhead for single thread mode and
also in cases where multi-threading is not used for specific size
ranges.

Also included a fix for ctest to set path of aocl-utils for netlibtests

AMD Internal : CPUPL-4009

Change-Id: I947298525c9771fb0f51ab8df3ccb55b58ea1e16
Added a new variant of multi-threaded ZGETRF. This variant makes use of AOCL-BLAS multi-thread framework.

This path will be taken only when FLA_ENABLE_AOCL_BLAS is enabled as there is a dependency on AOCL_BLAS framework APIs.

Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
Change-Id: Ifdd6cedc3f6892167991802a0ac3ba5ef6eaf54e
Code changes to fix abrupt execution halt due to syevx test
on windows

AMD-Internal: CPUPL-3991
Signed-off-by: dnikku <Deepika.Nikku@amd.com>
Change-Id: I64a5210f01b167f544e848cc0b8a24a44207416f
CMake of sample code updated to take aoclutils library
through option AOCLUTILS_LIBRARY_PATH

AMD Internal : CPUPL-3826

Change-Id: I28b5e337b477087c5d2c56dfc780063ea3495829
Fixed CTEST commands for syevx and zgetrf
Fixed cmake build command typo in test/example/ReadMe.txt

AMD-Internal: CPUPL-4181
Signed-off-by: dnikku <Deepika.Nikku@amd.com>
Change-Id: I6c4fadcbe4d4c9ebe86cfaa2aa1e0a616ad62271
  Passing incorrect value to LF_ISA_CONFIG cmake flag, was causing wrong value to be set to LF_ISA_CONFIG. Updated cmake to throw error instead.

AMD Internal : [CPUPL-4207]

Change-Id: Id1d27951e796f0600d4f5442ef168a10629d232c
AOCL-LAPACK library depends on AOCL-Utils library. Updated the library
and test suite build documentation with information on additonal
build flags to set AOCL-Utils library path.

AMD Internal : CPUPL-3826

Change-Id: If96a433ab676fcdc1967db9540245c08d755db40
Updated CMake build configuration to use C11 standard instead of gnu99 similar to autotools configure/make build method. With this change,
the failing Netlib tests are working fine. In addition, included D_GNU_SOURCE flag as that was needed by some of the POSIX standard pthread functions used by blis.h referenced in main test suite.

AMD Internal : [CPUPL-4214]

Change-Id: I7f6190ab053994ca8576d7d269fe7d3fdf20a3e0
Few more flags updated in CMake when user sets external openmp
library path. These new flags ensure only needed openmp library is
loaded and picks library only from user specified path.
Minor Readme edits.

AMD Internal : CPUPL-4220

Change-Id: I5ba43e304c0c241d3542b68bfa754952ae9810ab
Version of AOCL-LAPACK upgraded to 4.2.0 from 4.1.0

Signed-off-by: Vasanthakumar R <varajago@amd.com>
Change-Id: I91bf53fb2f1ff943072e154ed39dc225698e6de3
Link aocl-utils library as needed for shared library build for
both Linux and Windows

Change-Id: If52c00410d131236fac8821f9a25168771f366b4
Change-Id: Ica56467b4e995f0be131c0d247801feabbdb637f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants