{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":16143904,"defaultBranch":"master","name":"blis","ownerLogin":"flame","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-01-22T15:58:24.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/6494486?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1716755879.0","currentOid":""},"activityList":{"items":[{"before":"950d3099075e97a811806aae19b8abee501167b8","after":"ed57f0fc2a48543146205d6d5503822a95140f0b","ref":"refs/heads/stable","pushedAt":"2024-05-29T20:39:35.000Z","pushType":"push","commitsCount":9,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Fix incorrect commenting of `BLIS_RNTM_INITIALIZER` and\n`BLIS_OBJECT_INITIALIZER`.\n- (cherry picked from a316d2c6c33fc1f8f7c58c4210ab203f48349041)\n\nUpdate BLIS_*_INITIALIZER macros for C++ compatibility. (#802)\n\nDetails:\n- Remove designated initializer syntax. This isn't officially supported\n until C++20.\n- Arrange initializers in the order in which they are defined in the\n struct. Even with standard or extension support for designated\n initializers, initializing non-static members out-of-order is an\n error in C++.\n- Remove the conditional code which uses '-1' as the default value of\n the 'pack_buf' member of 'mem_t' in C, but 'BLIS_BUFFER_FOR_GEN_USE'\n in C++. Simply use the latter as a common-sense default.\n- (cherry picked from 664cc6bc3ea610b4ecea63d78c6024c48f045635)\n\nAdd cpu part codes for various manufacturers and use in the code (#794)\n\n* Add cpu_id symbols for arm v8.\n\n* Add symbols for arm v7.\n\n* Always assume firestorm on Apple aarch64.\n\n* Fixes incorrect usage of model vs. part in some places.\n\n* Fixes #793\n- (cherry picked from 1a8c8180b32cf5988bf9eb5d2f0f8111a729993a)\n\nFix errors and typos in docs/BLIS*API.md (#791)\n\nDetails:\n- Fixed errors and unified formatting in docs/BLIS*API.md docs.\n- (cherry picked from c382d8bdccc07e22a341fe04960f0cbf4eec083b)\n\nInclude bli_config.h before bli_system.h in cblas.h. (#789)\n\nDetails:\n- Previously, in cblas.h, bli_config.h was being #included *after*\n bli_system.h, which meant that the BLIS_ENABLE_SYSTEM macro was\n never defined in time for proper OS detection. This bug only\n affected cblas.h -- blis.h had been correctly #including\n bli_config.h before bli_system.h since fb93d24. Thanks to\n Edward Smyth for reporting this bug and suggesting the fix.\n- (cherry picked from a72e4569f2a03cc3578c019bf7ce25491a44137d)\n\nInstall helper headers to INCDIR prefix. (#787)\n\nDetails:\n- Install one-line headers to INCDIR whose entire purpose is to\n #include the actual headers within the local 'blis' header directory\n so that applications can #include \"blis.h\" instead of #include\n (and/or \"cblas.h\" instead of if CBLAS is\n enabled) when headers are installed to global paths. (Note that\n INCDIR is the installation prefix for headers as specified by\n '--includedir=INCDIR', which defaults to 'PREFIX/include' if not\n specified.) Not sure how this problem went unreported for so long,\n since presumably any user trying to #include \"blis.h\" from a global\n installation would have encountered a compiler error.\n- The one-line blis.h and cblas.h headers now reside in the 'build'\n directory, ready to install as is.\n- Thanks to to Jed Brown for reporting this via Issue #786, and for\n Devin Matthews and Mo Zhou for their engagement.\n- Harmonized the rule in the top-level Makefile for installing blis.pc\n into SHAREDIR/pkgconfig with conventions for others vis-a-vis\n verbosity/non-verbosity.\n- (cherry picked from 141a6c9a8e7557d9c7d28aecedec9dc5377dba13)\n\nAllow users to defines [sd]complex using std::complex (#784)\n\nDetails:\n- In C++ applications, it makes a lot of sense to interface to BLIS\n using C++'s standard complex number library, which uses a template\n class std::complex. Obviously BLIS doesn't know anything about this\n and defaults to a custom struct to represent complex numbers. This PR\n updates the bli_[cz]{real,imag}() functions to accept std::complex\n numbers when a C++ compiler is being used. Note that this has no\n effect on the compilation of the BLIS library (or testsuite), and only\n comes into play when including blis.h into a C++ project and forcing\n the use of std::complex for scomplex and dcomplex.\n- The application can explicitly request std:complex-based types via:\n\n #define BLIS_ENABLE_STD_COMPLEX\n #include \n // Call BLIS functions using std::complex here.\n\n- Fixed a bug in the definition of some scalar level-0 macros, since\n bli_creal()/bli_cimag() and bli_zreal()/bli_zimag() are no longer\n interchangeable.\n- (cherry picked from 2d9439298b336aa6d0ee000a5285a3adb4e6d462)\n\nCREDITS file update.\n\n- (cherry picked from f7ce54a252028483e4c6af619015eb22063d5541)","shortMessageHtmlLink":"Fix incorrect commenting of BLIS_RNTM_INITIALIZER and"}},{"before":null,"after":"097ca4ef49ade0fe8fa4671f79deb1826379531c","ref":"refs/heads/stable-nov3-cand0","pushedAt":"2024-05-26T20:37:59.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Added 'sifive_x280' subconfig, kernel set. (#737)\n\nDetails:\n- Added a new 'sifive_x280' subconfiguration for SiFive's x280 RISC-V\n instruction set architecture. The subconfig registers kernels from a\n correspondingly new kernel set, also named 'sifive_x280'.\n- Added the aforementioned kernel set, which includes intrinsics- and\n assembly-based implementations of most level-1v kernels along with\n level-1f kernels axpy2v dotaxpyv, packm kernels, and level-3 gemm,\n gemmtrsm_l, and gemmtrsm_u microkernels (plus supporting files).\n- Registered the 'sifive_x280' subconfig as belonging to a singleton\n family by the same name.\n- Added an entry to '.travis.yml' to test the new subconfig via qemu.\n- Updates to 'travis/do_riscv.sh' script to support the 'sifive_x280'\n subconfig and to reflect updated tarball names.\n- Special thanks to Lee Killough, Devin Matthews, and Angelika Schwarz\n for their engagement on this commit.\n- (cherry picked from commit 05388ddb66f8bf2d62009b162d64bf2d99226b83)\n\nFixed HPX barrier synchronization (#783)\n\nDetails:\n- Fixed hpx barrier synchronization. HPX was hanging on larger cores\n because blis was using non-hpx synchronization primitives. But when\n using hpx-runtime only hpx-synchronization primitives should be used.\n Hence, a C style wrapper hpx_barrier_t is introduced to perform hpx\n barrier operations.\n- Replaced hpx::for_loop with hpx::futures. Using hpx::for_loop with\n hpx::barrier on n_threads greater than actual hardware thread count\n causes synchronization issues making hpx hanging. This can be avoided\n by using hpx::futures, which are relatively very lightweight, robust\n and scalable.\n- (cherry picked from 7a87e57b69d697a9b06231a5c0423c00fa375dc1)\n\nFixed bug in sup threshold registration. (#782)\n\nDetails:\n- Fixed a bug that resulted in BLIS non-deterministically calling the\n gemmsup handler, irrespective of the thresholds that are registered\n via bli_cntx_set_blkszs().\n- Deep dive: In bli_cntx_init_ref.c, the default values for the gemmsup\n thresholds (BLIS_[MNK]T blocksizes) wre being set to zero so that no\n operation ever matched the criteria for gemmsup (unless specific sup\n thresholds are registered). HOWEVER, these thresholds are set via\n bli_cntx_set_blkszs() which calls bli_blksz_copy_if_pos(), which was\n only coping the thresholds into the gks' cntx_t if the values were\n strictly positive. Thus, the zero values passed into\n bli_cntx_set_blkszs() were being ignored and those threshold slots\n within the gks were left uninitialized. The upshot of this is that the\n reference gemmsup handler was being called for gemm problems\n essentially at random (and as it turns out, very rarely the reference\n gemmsup implementation would encounter a divide-by-zero error).\n- The problem was fixed by changing bli_blksz_copy_if_pos() so that it\n copies values that are non-negative (values >= 0 instead of > 0). The\n function was also renamed to bli_blksz_copy_if_nonneg()\n- Also needed to standardize use of -1 as the sole value to embed into\n blksz_t structs as a signal to bli_cntx_set_blkszs() to *not* register\n a value for that slot (and instead let whatever existing values\n remain). This required updates to the bli_cntx_init_*() functions for\n bgq, cortexa9, knc, penryn, power7, and template subconfigs, as some\n of these codes were using 0 instead of -1.\n- Fixes #781. Thanks to Devin Matthews for identifying, diagnosing, and\n proposing a fix for this issue.\n- (cherry picked from 8fff1e31da1c87e46cacec112b0ac280ab47cd8b)\n\nUpdate zen3 subconfig to support NVHPC compilers. (#779)\n\nDetails:\n- Parse $(CC_VENDOR) values of \"nvc\" in 'zen3' make_defs.mk file.\n- Minor refactor to accommodate above edit.\n- CREDITS file update.\n- (cherry picked from 1e264a42474b535431768ef925bbd518412d392e)\n\nFixed brokenness when sba is disabled. (#777)\n\nDetails:\n- Previously, disabling the sba via --disable-sba-pools resulted in a\n segfault due to a sanity-check-triggering abort(). The problem was\n that the sba, as currently used in the l3 thread decorators, did not\n yet (fully) support pools being disabled. The solution entailed\n creating wrapper function, bli_sba_array_elem(), which either calls\n bli_apool_array_elem() (when sba pools are enabled at configure time)\n or returns a NULL sba_pool pointer (when sba pools are disabled), and\n calling bli_sba_array_elem() in place of bli_apool_array_elem(). Note\n that the NULL pointer returned by bli_sba_array_elem() when the sba\n pools are disabled does no harm since in that situation the pointer\n goes unreferenced when acquiring and releasing small blocks. Thanks to\n John Mather for reporting this bug.\n- Guarded the bodies of bli_sba_init() and bli_sba_finalize() with\n #ifdef BLIS_ENABLE_SBA_POOLS. I don't think this was actually necessary\n to fix the aforementioned bug, but it seems like good practice.\n- Moved the code in bli_l3_thrinfo_create() that checked that the array*\n pointer is non-NULL before calling bli_sba_array_elem() (previously\n bli_apool_array_elem()) into the definition of bli_sba_array_elem().\n- Renamed various instances of 'pool' variables and function parameters\n to 'sba_pool' to emphasize what kind of pool it represents.\n- Whitespace changes.\n- (cherry picked from c2099ed2519dcac8ee421faf999b36e1c2260be7)\n\nImplemented [cz]symv_(), [cz]syr_(), [cz]rot_(). (#778)\n\nDetails:\n- Expanded existing BLAS compatibility APIs to provide interfaces to\n [cz]symv_(), [cz]syr_(). This was easy since those operations were\n already implemented natively in BLIS; the APIs were previously\n omitted only because they were not formally part of the BLAS.\n- Implemented [cz]rot_() by feeding code from LAPACK 3.11 through\n f2c.\n- Thanks to James Foster for pointing out that LAPACK contains these\n additional symbols, which prompted these additions, as well as for\n testing the [cz]rot_() functions from Julia's test infrastructure.\n- CREDITS file update.\n- (cherry picked from 37ca4fd168525a71937d16aaf6a13c0de5b4daef)\n\nFixes to HPC runtime code path. (#773)\n\nDetails:\n- Fixed hpx::for_each invocation and replace with hpx::for_loop. The HPX\n runtime was initialized using hpx::start, but the hpx::for_each\n function was being called on a non-hpx runtime (i.e standard BLIS\n runtime - single main thread). To run hpx::for_each on HPX runtime\n correctly, the code now uses hpx::run_as_hpx_thread(func, args...).\n- Replaced hpx::for_each with hpx::for_loop, which eliminates use of\n hpx::util::counting_iterator.\n- Employ hpx::execution::chunk_size(1) to make sure that a thread\n resides on a particular core.\n- Replaced hpx::apply() with updated version hpx::post().\n- Initialize tdata->id = 0 in libblis.c to 0, as it is the main thread\n and is needed for writing results to output file.\n- By default, if not specified, the HPX runtime uses all N threads/cores\n available in the system. But, if we want to only specify n_threads out\n N threads, we use hpx::execution::experimental::num_cores(n_threads).\n- (cherry picked from a4a63295b96ed5b32f4df6477d24db07bf431202)\n\nFixed broken link in Multithreading.md. (#774)\n\nDetails:\n- Replaced 404'd link in docs/Multithreading.md with an archive from\n The Wayback Machine.\n- CREDITS file update.\n- (cherry picked from c6546c1131b1ddd45ef13f9f2b620ce2e955dbf8)","shortMessageHtmlLink":"Added 'sifive_x280' subconfig, kernel set. (#737)"}},{"before":null,"after":"961e998fd27a9a5273b53a495532079671005a10","ref":"refs/heads/stable-aug27-cand0","pushedAt":"2024-05-26T20:37:59.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Revamped bli_init() to use TLS where feasible. (#767)\n\nDetails:\n- Revamped bli_init_apis() and bli_finalize_apis() to use separate\n bli_pthread_switch_t objects for each of the five sub-API init\n functions, with the objects for the 'ind' and 'rntm' sub-APIs being\n declared with BLIS_THREAD_LOCAL. This allows some APIs to be treated\n as thread-local and the rest as thread-shared. Thanks to Edward Smyth\n for requesting application thread-specific rntm_t structs, which\n inspired these change.\n- Combined bli_thread_init_from_env() and bli_pack_init_from_env() into\n a new function, bli_rntm_init_rntm_from_env(), and placed the combined\n code in bli_rntm.c inside of a new bli_rntm_init() function. Then\n removed the (now empty) bli_pack_init() and _finalize() function defs.\n- Deprecated bli_rntm_init() for the purposes of initializing a rntm_t\n (temporarily preserving it as bli_rntm_clear() in a cpp-undefined code\n block) so that the function name could be used for the aforementioned\n bli_rntm_init() function.\n- Updated libblis_test_pobj_create() in test_libblis.c to use a static\n rntm_t initializer instead of the deprecated bli_rntm_init()\n function-based option.\n- Minor updates to docs/Multithreading.md, including removal of\n bli_rntm_init() in the example of how to initialize rntm_t structs.\n- Changed the return value of bli_gks_init(), bli_ind_init(),\n bli_memsys_init(), bli_thread_init(), and bli_rntm_init() (and their\n finalize() counterparts) from 'void' to 'int' so that those functions\n match the function type expected by bli_pthread_switch_on()/_off().\n Those init/finalize functions now return 0 to indicate success, which\n is needed so that the switch actually changes state from off to on\n and vice versa.\n- Defined bli_thread_reset(), which copies the contents of the\n global_rntm_at_init() struct into the global_rntm struct (for the\n current application thread).\n- Guard calls to bli_pthread_mutex_lock()/_unlock() in\n - bli_pack_set_pack_a() and _pack_b()\n - bli_rntm_init_from_global()\n - bli_thread_set_ways()\n - bli_thread_set_num_threads()\n - bli_thread_set_thread_impl()\n - bli_thread_reset()\n - bli_l3_ind_oper_set_enable()\n with #ifdef BLIS_DISABLE_TLS (since TLS precludes the possibility of\n race conditions).\n- In frame/base/bli_rntm.c, declare global_rntm, global_rntm_at_init,\n and global_rntm_mutex as BLIS_THREAD_LOCAL so that separate\n application threads can change the number of ways of BLIS parallelism\n independently from one another.\n- Access global_rntm only via a new private (not exported) function,\n bli_global_rntm(). Defined a similar function for a rntm_t new to\n this commit, global_rntm_at_init, which preserves the state of the\n global rntm at initialization-time.\n- In frame/3/bli_l3_ind.c, added a guard to the declaration of the\n static variable oper_st_mutex with #ifdef BLIS_DISABLE_TLS so that the\n mutex is omitted altogether when TLS is enabled (which prevents the\n compiler from warning about an unused variable).\n- Removed redundant code from bli_thread.c:\n #ifdef BLIS_ENABLE_HPX\n #include \"bli_thread_hpx.h\"\n #endif\n since this code is already present in bli_thread.h.\n- Thanks to Minh Quan Ho for his review of and feedback on this commit.\n- Comment updates.\n- (cherry picked from commit 6dcf7666eff14348e82fbc2750be4b199321e1b9)","shortMessageHtmlLink":"Revamped bli_init() to use TLS where feasible. (#767)"}},{"before":null,"after":"ed57f0fc2a48543146205d6d5503822a95140f0b","ref":"refs/heads/stable-mar28-cand0","pushedAt":"2024-05-26T20:37:59.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Fix incorrect commenting of `BLIS_RNTM_INITIALIZER` and\n`BLIS_OBJECT_INITIALIZER`.\n- (cherry picked from a316d2c6c33fc1f8f7c58c4210ab203f48349041)\n\nUpdate BLIS_*_INITIALIZER macros for C++ compatibility. (#802)\n\nDetails:\n- Remove designated initializer syntax. This isn't officially supported\n until C++20.\n- Arrange initializers in the order in which they are defined in the\n struct. Even with standard or extension support for designated\n initializers, initializing non-static members out-of-order is an\n error in C++.\n- Remove the conditional code which uses '-1' as the default value of\n the 'pack_buf' member of 'mem_t' in C, but 'BLIS_BUFFER_FOR_GEN_USE'\n in C++. Simply use the latter as a common-sense default.\n- (cherry picked from 664cc6bc3ea610b4ecea63d78c6024c48f045635)\n\nAdd cpu part codes for various manufacturers and use in the code (#794)\n\n* Add cpu_id symbols for arm v8.\n\n* Add symbols for arm v7.\n\n* Always assume firestorm on Apple aarch64.\n\n* Fixes incorrect usage of model vs. part in some places.\n\n* Fixes #793\n- (cherry picked from 1a8c8180b32cf5988bf9eb5d2f0f8111a729993a)\n\nFix errors and typos in docs/BLIS*API.md (#791)\n\nDetails:\n- Fixed errors and unified formatting in docs/BLIS*API.md docs.\n- (cherry picked from c382d8bdccc07e22a341fe04960f0cbf4eec083b)\n\nInclude bli_config.h before bli_system.h in cblas.h. (#789)\n\nDetails:\n- Previously, in cblas.h, bli_config.h was being #included *after*\n bli_system.h, which meant that the BLIS_ENABLE_SYSTEM macro was\n never defined in time for proper OS detection. This bug only\n affected cblas.h -- blis.h had been correctly #including\n bli_config.h before bli_system.h since fb93d24. Thanks to\n Edward Smyth for reporting this bug and suggesting the fix.\n- (cherry picked from a72e4569f2a03cc3578c019bf7ce25491a44137d)\n\nInstall helper headers to INCDIR prefix. (#787)\n\nDetails:\n- Install one-line headers to INCDIR whose entire purpose is to\n #include the actual headers within the local 'blis' header directory\n so that applications can #include \"blis.h\" instead of #include\n (and/or \"cblas.h\" instead of if CBLAS is\n enabled) when headers are installed to global paths. (Note that\n INCDIR is the installation prefix for headers as specified by\n '--includedir=INCDIR', which defaults to 'PREFIX/include' if not\n specified.) Not sure how this problem went unreported for so long,\n since presumably any user trying to #include \"blis.h\" from a global\n installation would have encountered a compiler error.\n- The one-line blis.h and cblas.h headers now reside in the 'build'\n directory, ready to install as is.\n- Thanks to to Jed Brown for reporting this via Issue #786, and for\n Devin Matthews and Mo Zhou for their engagement.\n- Harmonized the rule in the top-level Makefile for installing blis.pc\n into SHAREDIR/pkgconfig with conventions for others vis-a-vis\n verbosity/non-verbosity.\n- (cherry picked from 141a6c9a8e7557d9c7d28aecedec9dc5377dba13)\n\nAllow users to defines [sd]complex using std::complex (#784)\n\nDetails:\n- In C++ applications, it makes a lot of sense to interface to BLIS\n using C++'s standard complex number library, which uses a template\n class std::complex. Obviously BLIS doesn't know anything about this\n and defaults to a custom struct to represent complex numbers. This PR\n updates the bli_[cz]{real,imag}() functions to accept std::complex\n numbers when a C++ compiler is being used. Note that this has no\n effect on the compilation of the BLIS library (or testsuite), and only\n comes into play when including blis.h into a C++ project and forcing\n the use of std::complex for scomplex and dcomplex.\n- The application can explicitly request std:complex-based types via:\n\n #define BLIS_ENABLE_STD_COMPLEX\n #include \n // Call BLIS functions using std::complex here.\n\n- Fixed a bug in the definition of some scalar level-0 macros, since\n bli_creal()/bli_cimag() and bli_zreal()/bli_zimag() are no longer\n interchangeable.\n- (cherry picked from 2d9439298b336aa6d0ee000a5285a3adb4e6d462)\n\nCREDITS file update.\n\n- (cherry picked from f7ce54a252028483e4c6af619015eb22063d5541)","shortMessageHtmlLink":"Fix incorrect commenting of BLIS_RNTM_INITIALIZER and"}},{"before":null,"after":"280553d1f5810b46edd455c848f10907fe2bcea2","ref":"refs/heads/stable-aug19-cand0","pushedAt":"2024-05-22T21:21:00.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Fixed error when using common.mk from testsuite. (#768)\n\nDetails:\n- Commit 2db31e0 (#755) inserted logic into common.mk that attempts to\n preprocess build/detect/android/bionic.h to determine whether the\n __BIONIC__ macro is defined (in which case -lrt should not be included\n in LDFLAGS). However, the path to bionic.h was encoded without regard\n to DIST_PATH, and so utilizing common.mk anywhere that isn't the top-\n level directory (such as in the testsuite directory) resulted in a\n compiler error:\n\n gcc: error: build/detect/android/bionic.h: No such file or directory\n gcc: fatal error: no input files\n compilation terminated.\n\n This commit adds a $(DIST_PATH) prefix to the path to bionic.h so that\n it can be located from other applications' Makefiles that use BLIS's\n makefile fragments.\n- (cherry picked from commit fa6a9b24ae2ddbd5f30f657d46004843581c768c)\n\nSet thrcomm timpl_t id inside init functions. (#766)\n\nDetails:\n- Previously, the timpl_t id being used when a thrcomm_t is being\n initialized was set within the bli_thrcomm_init() dispatch function\n after the timpl_t-specific bli_thrcomm_init_*() function returned. But\n it just occurred to me that each bli_thrcomm_init_*() function already\n intrinsically knows its own timpl_t value. This commit shifts the\n setting of the thrcomm_t.ti field into the corresponding\n bli_thrcomm_init_*() function for each timpl_t type (e.g. single,\n openmp, pthreads, hpx).\n- Removed long-deprecated code dating back nearly 10 years.\n- Whitespace changes\n- Comment updates.\n- (cherry picked from 634e532c8dcce7383d96ba33276df65c656b2198)\n\nSmall fixes/improvements to docs/Multithreading.md. (#764)\n\nDetails:\n- Added reminders that #include \"blis.h\" must be added to source files\n in order to access BLIS API function prototypes. Thanks to Barry Smith\n for suggesting this improvement.\n- Fixed pre-existing typos.\n- CREDITS file update.\n- (cherry picked from 3cf17b4a91232709bc6a205b0e4d7ecc96579aa9)\n\nCREDITS file update.\n\nDetails:\n- Thanks to Igor Zhuravlov for PR #753 (commit 915daaa).\n- (cherry picked from dbc79812c390f812c7bf030bfcf87e947a1443c4)\n\nFix typos in docs + example code comments. (#753)\n\nDetails:\n- Fixed various typos in API documentation in docs/BLIS*API.md and\n comments in the source code examples within examples/?api/*.c.\n- (cherry picked from 915daaa43cd189c86d93d72cd249714f126e9425)\n\nExclude -lrt on Android with Bionic libraries. (#755)\n\nDetails:\n- Added build/detect/android/bionic.h header to test whether the\n __BIONIC__ cpp macro is defined.\n- In common.mk, only add -lrt to LDFLAGS when Bionic is not present.\n- CREDITS file update.\n- (cherry picked from 2db31e057e7e9c97fc60021b5ae72a01a48d7588)\n\nSmall fixes to support hpx in the testsuite (#759)\n\nDetails:\n- Minor changes to test_libblis.c to support hpx.\n- (cherry picked from 22ad8c1b752364784f320168b31995945ad84a59)\n\nAuto-detect the RISC-V ABI of the compiler and use -mabi= during RISC-V Build\ns (#750)\n\nDetails:\n- Generate a build error if there is a 32/64-bit mismatch between the\n RISC-V ABI or architecture and the BLIS configuration selected.\n- Handle Q, Zicsr, ZiFencei, Zba, Zbb, Zbc, Zbs and Zfh extensions in\n the RISC-V architecture auto-detection. ZiFencei and Zicsr is not\n detectable with built-in RISC-V macros right now.\n- ZiFencei is not important for BLIS because doesn't it have\n Just-In-Time compilation or self-modifying code, and Zicsr is implied\n by the floating-point extensions, which are required for good\n performance in BLIS.\n- Move RISC-V autodetect header files to build/detect/riscv/.\n- (cherry picked from c91b41d022e33da82b3b06c82be047a29873d9b6)\n\nRewrote regen-symbols.sh (gen-libblis-symbols.sh). (#751)\n\nDetails:\n- Wrote an alternative to regen-symbols.sh, gen-libblis-symbols.sh,\n that generates a list of exported symbols from the monolithic blis.h\n file rather than peeking inside of the shared object via nm. (This new\n script lives in the 'build' directory and the older script has been\n retired to build/old.) Special thanks to Devin Matthews for authoring\n gen-libblis-symbols.sh.\n- Added a 'symbols' target to the top-level Makefile which will refresh\n build/libblis-symbols.def, with supporting changes to common.mk.\n- Updates to build/libblis-symbols.def using the new symbol-generating\n script.\n- (cherry picked from a0b04e3c007f1207e5678bf20c07752906742fb7)\n\nFix 1m enablement for herk/her2k/syrk/syr2k. (#743)\n\nDetails:\n- Ever since 28b0982, herk, her2k, syrk, and syr2k have been implemented\n in terms of the gemmt expert API. And since the decision of which\n induced method to use (1m or native) is made *below* the level of the\n expert API, executing any of {herk,her2k,syrk,syr2k} results in BLIS\n checking the enablement status for gemmt.\n- This commit applies a band-aid of sorts to this issue by modifying\n bli_l3_ind_oper_get_enable() and bli_l3_ind_oper_set_enable() so that\n any attempts to query or modify the internal enablement status for\n herk, her2k, syrk, or syr2k instead does so for gemmt.\n- This solution isn't perfect since, in theory, the user could enable 1m\n for, say, herk but then disable it for syrk, and then be confused when\n herk runs via native execution. But we don't anticipate that users\n modify 1m enablement at the operation level, and so in practice this\n solution is likely fine for now.\n- (cherry picked from 89b7863fc9a88903917deedc6a5ad9fd17f83713)\n\nadd nvhpc compiler support (#719)\n\nAdd detection of the NVIDIA nvhpc compiler (`nvc`) in `configure`,\nand adjust some warning options in `config.mk`. Currently, no\nspecific options for `nvc` have been added in the relevant\nconfigurations so it may not be usable without further tweaks.\n- (cherry picked from 138de3b3e88c5bf7d8718c45c88811771cf42db8)","shortMessageHtmlLink":"Fixed error when using common.mk from testsuite. (#768)"}},{"before":null,"after":"65ef992284779273d55cd4e19007fdd8d767e1f6","ref":"refs/heads/stable-may7b-cand0","pushedAt":"2024-05-22T21:01:13.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Consolidate INSERT_ macro sets via variadic macros. (#744)\n\nDetails:\n- Consolidated INSERT_GENTFUNC_* (and corresponding GENTPROT) macro sets\n using variadic macros (__VA_ARGS__), which means we no longer need a\n different INSERT_ macro for each possible number of arguments the\n macro might take. This change seems reasonable given that variadic\n macros are a standard C99 feature and widely supported. I took care\n not to use variadic macros where 0 variadic arguments are expected\n since that is a non-standard extension.\n- Added pre-typecast parentheses to arithmetic expressions in printf()\n statements in bli_thread_range_tlb.c.\n- (cherry picked from 0873c0f6ed03fea321d1631b3d1a385a306aa797)","shortMessageHtmlLink":"Consolidate INSERT_ macro sets via variadic macros. (#744)"}},{"before":null,"after":"cbbccc83ce0d296e3b631e1e2fe79978d97891bf","ref":"refs/heads/stable-may7a-cand0","pushedAt":"2024-05-22T20:49:12.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Added missing #include for Windows. (#747)\n\nDetails:\n- This commit fixes issue #746, in which the _access() function (called\n from within blastest/f2c/open.c) is undeclared when compiling on\n Windows with clang 16.\n- (cherry picked from commit ef9d3e6675320a53e7cb477c16b01388e708b1da)\n\nFix bug in detecting Fortran compiler vendor (#745)\n\n`FC` was used instead of `found_fc`.\n- (cherry picked from 6fd9aabb03d172a792a7eeb106c7d965cf038421)\n\nApply #738 to make_defs.mk of RISC-V subconfigs. (#740)\n\nDetails:\n- PR #738 -- which moved -fPIC flag insertion responsibilities from\n common.mk to the subconfigs' individual make_defs.mk files -- was\n merged shortly before the introduction of new RISC-V subconfigs in\n #693. This commit brings those RISC-V subconfigs up to date with the\n new -fPIC conventions.\n- (cherry picked from 8215b02f99aa77ecc7d813508c247565115319d7)\n\nAdd RISC-V target (#693)\n\nDetails:\n- There are four RISC-V base configurations: 'rv32i', 'rv32iv', 'rv64i',\n and 'rv64iv', namely the 32-bit and 64-bit implementations with and\n without the 'V' vector extension. Additional extensions such as 'M'\n (multiplication), 'A' (atomics), 'F' ('float' hardware support), 'D'\n ('double' hardware support), and 'C' (compressed-length instructions),\n are automatically used when available. If they are not available, then\n software equivalents (e.g., softfloat and -latomic) are used.\n- './configure auto' can be invoked on a RISC-V build platform, and will\n automatically detect RISC-V CPU extensions through the RISC-V C API:\n https://github.com/riscv-non-isa/riscv-c-api-doc/blob/master/riscv-c-api.md\n- The assembly kernels assume the presence of the vector extension\n RVV 1.0.\n- It is possible to build 'rv[32,64]iv' for any value of VLEN.\n However, if VLEN < 128, the targets will fall back to the generic\n kernels and blocksizes.\n- The vector microkernels are vector-length agnostic and work with\n every VLEN >=128, but are expected to work best with smaller vector\n lengths, i.e., VLEN <= 512.\n- The assembly kernels cover column major storage (rs_c == 1).\n- The blocksizes aim at being a good generic choice for out-of-order\n cores. They are not tuned to a specific RISC-V HPC core.\n- The vector kernels have been tested using vlen={128,256,512}.\n- The single- and double-precision assembly code routines for 'sgemm'\n and 'dgemm', or for 'cgemm' and 'zgemm', are combined in their RISC-V\n vector assembly source code, and are differentiated only with macros.\n- The XLEN=32 and XLEN=64 versions of the RISC-V assembly code are\n identical, except that callee-saved registers are saved and restored\n differently. There are RISC-V assembly code #include files for\n handling the saving and restoring of callee-saved registers, and they\n are future-proof if ever XLEN=128.\n- Multiplications, such as computing array strides and offsets, are\n performed in C, and later passed to the RISC-V assembly kernels. This\n is so that the compiler can determine whether the 'M' (multiply)\n extension is available and use multiplication instructions, or call\n library helper functions instead.\n- A new macro called bli_static_assert() has been added to perform\n static assertions at compile-time, regardless of the C/C++ dialect of\n the compiler. The original motivation of this was to ensure that\n calling RISC-V assembly kernels would not silently truncate arguments\n of type 'dim_t' or 'inc_t' (so-called \"narrowing conversions\").\n- RISC-V CI tests have been added to Travis CI, using the\n riscv-gnu-toolchain cross-compiler, and qemu simulator.\n- Thanks to Lee Killough for collaborating on this commit.\n- (cherry picked from 6b38c5ac07a2a27738674784e58aa699bf895447)","shortMessageHtmlLink":"Added missing #include <io.h> for Windows. (#747)"}},{"before":null,"after":"520fb89dafb27a614fe6b0e840cfb1bf705a7254","ref":"refs/heads/stable-apr8-cand0","pushedAt":"2024-05-22T20:43:38.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"CREDITS file update.\n\n- (cherry picked from 593d01761910af6a9a16ee0ac097142732f73c29)\n\nCREDITS file update.\n\nDetails:\n- Added attributions associated with commits:\n - 98d4678 9b1beec: @bartoldeman\n - 2b05948 059f151: @ct-clmsn\n- Reordered attirubtion for @decandia50.\n- (cherry picked from 259f68479671bbaf9c5986759aaa0004f9b05a24)\n\nOptionally disable thread-local storage. (#735)\n\nDetails:\n- Implemented a new configure option, --disable-tls, which allows the\n user to optionally disable the use of thread-local storage qualifiers\n on static variables in BLIS. This option will rarely be needed, but\n in some situations may allow BLIS to compile when TLS is unavailable.\n Thanks to Nick Knight for suggesting this option.\n- Unlike the --disable-system option, --disable-tls does not forcibly\n disable threading. Instead, warnings of the possible consequences of\n using threading with TLS disabled are added to:\n - the output of './configure --help';\n - the output of 'configure' the --disable-tls option is parsed;\n - the informational header output by the testsuite.\n Thanks to Minh Quan Ho for suggesting these warnings.\n- Modified frame/include/bli_lang_defs.h so that BLIS_THREAD_LOCAL is\n defined to nothing when BLIS_ENABLE_TLS is not defined.\n- Defined bli_info_get_enable_tls(), which returns whether the cpp macro\n BLIS_ENABLE_TLS was defined.\n- Edited --disable-system configure status output for clarity.\n- Whitespace updates.\n- (cherry picked from aea8e1d9243631635ca788d5e14f0f29328e637d)\n\nAdd output.testsuite to .gitignore (#736)\n\nDetails:\n- Added `output.testsuite` to .gitignore since it was previously not\n being matched by `output.testsuite.*`.\n- (cherry picked from 3f1432abe75cc306ef90a04381d7e0d8739fded8)\n\nAdded mm_algorithm pdf files (bp and pb).\n\nDetails:\n- Added PDF versions of the PowerPoint files added in 17cd260.\n- (cherry picked from 38fc5237520a2f20914a9de8bb14d5999009b3fb)\n\nAdded mm_algorithm pptx files (bp and pb).\n\nDetails:\n- Added two PowerPoint files that contain slides depicting the classic\n Goto algorithm for matrix multiplication as well as its sister\n \"panel-block\" algorithm. These files reside in docs/diagrams.\n- (cherry picked from 17cd260cb504b2f3997c32daec77f4c828fbb32b)\n\nMove -fPIC insertion to subconfigs' make_defs.mk. (#738)\n\nDetails:\n- Previously, common.mk was appending -fPIC to the CPICFLAGS variables\n set within the various subconfigurations' make_defs.mk files. This\n seemed somewhat unintuitive, and so now the -fPIC flag is assigned to\n the various subconfigs' CPICFLAGS variables in the respective\n make_defs.mk files.\n- This also commit changes the logic in common.mk so that instead of\n appending, the variable is overwritten, but now *only* in the case\n of Windows (since apparently -fPIC needs to be omitted there). Thanks\n to Nick Knight for catching and reporting this weirdness.\n- (cherry picked from 9d778e0f7c94d8752dd578101e4fc6893a1f54ef)","shortMessageHtmlLink":"CREDITS file update."}},{"before":null,"after":"f4a2c721e62e694e6da5444827e906fae5abed77","ref":"refs/heads/stable-mar27-cand0","pushedAt":"2024-05-22T20:43:38.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Fixed compile errors with `BLIS_DISABLE_BLAS_DEFS`. (#730)\n\nDetails:\n- This commit fixes a compile-time error related to the type definition\n (prototype) of dsdot_() when BLIS_DISABLE_BLAS_DEFS is defined by the\n application (or the configuration), which is actually a symptom of a\n larger design issue when disabling BLAS prototypes. The macro was\n intended to allow applications to bring their own BLAS prototypes and\n suppress the inclusion of duplicate (or possibly conflicting)\n prototypes within blis.h. However, prototypes are still needed during\n compilation even if they are ultimately omitted from blis.h. The\n problem is that almost every source file in BLIS--including the BLAS\n compatibility layer--only includes one header (blis.h), and if we\n were to #include a new header in the BLAS source files (to isolate\n only the BLAS prototypes), we would also have to make the build system\n aware of the location of those headers. Thanks to Edward Smyth of AMD\n for reporting this issue.\n- The solution I settled upon was to remove all cpp guards from all BLAS\n headers (by changing them to #if 1, for easy search-and-replace\n anchoring in the future if we ever need to re-insert guards) and\n modifying bli_blas.h so that the BLAS prototypes are #included if\n either (a) BLIS_ENABLE_BLAS_DEFS is defined, or (b)\n BLIS_ENABLE_BLAS_DEFS is *not* defined but BLIS_IS_BUILDING_LIBRARY\n *is* defined. (Thanks to Devin Matthews for steering me away from an\n inferior solution.)\n- This commit also spins off the actual BLAS prototypes/definitions to\n a separate file, bli_blas_defs.h.\n- CREDITS file update.\n- (cherry picked from commit 04090df01175477394d1e73af2e5769751d47cd6)","shortMessageHtmlLink":"Fixed compile errors with BLIS_DISABLE_BLAS_DEFS. (#730)"}},{"before":null,"after":"bfdd6ebf459d3e684c0fe87a10e499b7e75f4de1","ref":"refs/heads/stable-mar24-cand1","pushedAt":"2024-05-22T00:40:59.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Omit -fPIC if shared library build is disabled. (#732)\n\nDetails:\n- Updated common.mk so that when --disable-shared option is given to\n configure:\n 1. The -fPIC compiler flag is omitted from the individual\n configuration family members' CPICFLAGS variables (which are\n initialized in each subconfig's make_defs.mk file); and\n 2. The BUILD_SYMFLAGS variable, which contains compiler flags needed\n to control the symbol export behavior, is left blank.\n- The net result of these changes is that flags specific to shared\n library builds are only used when a shared library is actually\n scheduled to be built. Thanks to Nick Knight for reporting this issue.\n- CREDITS file update.\n- (cherry picked from 5f841307f668f65b7ed5a479bd8374d2581208cf)\n\nUpdated configure to pass all shellcheck checks. (#729)\n\nDetails:\n- Modified configure so that it passes all 'shellcheck' checks,\n disabling ones which we violate but which are just stylistic, or are\n special cases in our code.\n- Miscellaneous other minor changes, such as rearranged redirections in\n long sed/perl pipes to look more natural.\n- Whitespace tweaks.\n- (cherry picked from 72c37eb80f964b7840377076e5009aec5b29d320)\n\nFixed bugs in scal2v ref kernel when alpha == 1. (#728)\n\nDetails:\n- Fixed a typo bug in ref_kernels/1/bli_scal2v_ref.c where the\n conditional that was supposed to be checking for cases when alpha is\n equal to 1.0 (so that copyv could be used instead of scal2v) was\n instead erroneously comparing alpha against 0.0.\n- Fixed another bug in the same function whereby BLIS_NO_CONJUGATE was\n erroneously being passed into copyv instead of the kernel's conjx\n parameter. This second bug was inert, however, due to the first bug\n since the \"alpha == 0.0\" case was already being handled, resulting in\n the code block never executing.\n- (cherry picked from 60f36347c16e6336215cd52b4e5f3c0f96e7c253)\n\nUse 'void*' datatypes in kernel APIs. (#727)\n\nDetails:\n- Migrated all kernel APIs to use void* pointers instead of float*,\n double*, scomplex*, and dcomplex* pointers. This allows us to define\n many fewer kernel function pointer types, which also makes it much\n easier to know which function pointer type to use at any given time.\n (For example, whereas before there was ?axpyv_ker_ft, ?axpyv_ker_vft,\n and axpyv_ker_vft, now there is just axpyv_ker_ft, which is equivalent\n so what axpyv_ker_vft used to be.)\n- Refactored how kernel function prototypes and kernel function types\n are defined so as to reduce redundant code. Specifically, the\n function signatures (excluding cntx_t* and, in the case of level-3\n microkernels, auxinfo_t*) are defined in new headers named, for\n example, bli_l1v_ker_params.h. Those signatures are reused via macro\n instantiation when defining both kernel prototypes and kernel function\n types. This will hopefully make it a little easier to update, add, and\n manage kernel APIs going forward.\n- Updated all reference kernels according to the aforementioned switch\n to void* pointers.\n- Updated all optimzied kernels according to the aforementioned switch\n to void* pointers. This sometimes required renaming variables,\n inserting typecasting so that pointer arithmetic could continue to\n function as intended, and related tweaks.\n- Updated sandbox/gemmlike according to the aforementioned switch to\n void* pointers.\n- Renamed:\n - frame/1/bli_l1v_ft_ker.h -> frame/1/bli_l1v_ker_ft.h\n - frame/1f/bli_l1f_ft_ker.h -> frame/1f/bli_l1f_ker_ft.h\n - frame/1m/bli_l1m_ft_ker.h -> frame/1m/bli_l1m_ker_ft.h\n - frame/3/bli_l1m_ft_ukr.h -> frame/3/bli_l1m_ukr_ft.h\n - frame/3/bli_l3_sup_ft_ker.h -> frame/3/bli_l3_sup_ker_ft.h\n to better align with naming of neighboring files.\n- Added the missing \"void* params\" argument to bli_?packm_struc_cxk() in\n frame/1m/packm/bli_packm_struc_cxk.c. This argument is being passed\n into the function from bli_packm_blk_var1(), but wasn't being \"caught\"\n by the function definition itself. The function prototype for\n bli_?packm_struc_cxk() also needed updating.\n- Reordered the last two parameters in bli_?packm_struc_cxk().\n (Previously, the \"void* params\" was passed in after the\n \"const cntx_t* cntx\", although because of the above bug the params\n argument wasn't actually present in the function definition.)\n- (cherry picked from fab18dca46618799bb0b4f652820b33d36a5d4d4)\n\nUse 'const' pointers in kernel APIs. (#722)\n\nDetails:\n- Qualified all input-only data pointers in the various kernel APIs with\n the 'const' keyword while also removing 'restrict' from those kernel\n APIs. (Use of 'restrict' was maintained in kernel implementations,\n where appropriate.) This affected the function pointer types defined\n for all of the kernels, their prototypes, and the reference and\n optimized kernel definitions' signatures.\n- Templatized the definitions of copys_mxn and xpbys_mxn static inline\n functions.\n- Minor whitespace and style changes (e.g. combining local variable\n declaration and initialization into a single statement).\n- Removed some unused kernel code left in 'old' directories.\n- Thanks to Nisanth M P for helping to validate changes to the power10\n microkernels.\n- (cherry picked from 93c63d1f469c4650df082d0fa2f29c46db0e25f5)","shortMessageHtmlLink":"Omit -fPIC if shared library build is disabled. (#732)"}},{"before":null,"after":"950d3099075e97a811806aae19b8abee501167b8","ref":"refs/heads/stable","pushedAt":"2024-05-21T22:55:50.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Restored ArmSVE general storage case. (#708)\n\nDetails:\n- Restored general storage case in armsve kernels.\n- Reason for doing this: Though real `g`-storage is difficult to\n speedup, `g`-codepath here can provide a good support for\n transposed-storage. i.e. at least good for `GEMM_UKR_SETUP_CT_AMBI`.\n- By experience, this solution is only *a little* slower than in-reg\n transpose. Plus in-reg transpose is only possible for a fixed VL in\n our case.\n- (cherry picked from 4e18cd34f909c5045597f411340ede3a5e0bc5e1)\n\nRefined emacs handling of indentation. (#717)\n\nDetails:\n- This refines the emacs autoformatting to be better in line with\n contribution guidelines.\n- Removed a stray shebang in a .mk file which confuses emacs about the\n file mode, which should be makefile-mode. (emacs also removes stray\n whitespace at the ends of lines.)\n- (cherry picked from 0ba6e9eafb1e667373d9dbc2aa045557921f33e2)\n\nUpdated hpx namespace for make_count_shape. (#725)\n\nDetails:\n- The hpx namespace for *counting_shape changed. This PR updates the use\n of counting_shape in blis to comply with the change in hpx.\n- Co-authored-by: ctaylor \n- (cherry picked from 059f15105b1643fe56084f883c22b3cadf368b39)\n\nAdded an 'arm64' entry to `.travis.yml`. (#726)\n\nDetails:\n- Added a new 'arm64' entry to the .travis.yml file in an attempt to get\n Travis CI to compile both NEON and SVE kernels, even if only NEON\n kernels are exercised in the testing. With this new 'arm64' entry, the\n 'cortexa57' entry becomes redundant and may be removed. Thanks to\n RuQing Xu for this suggestion.\n- Previously, the macro BLIS_SIMD_MAX_SIZE was *not* being set in\n bli_kernels_arm64.h, which meant that the default value of 64 was\n being used. This caused a runtime consistency check to fail in\n bli_gks.c (in Travis CI), one which requires that\n\n mr * nr * dt_size > BLIS_STACK_BUF_MAX_SIZE\n\n for all datatype sizes dt_size, where BLIS_STACK_BUF_MAX_SIZE is\n defined as\n\n BLIS_SIMD_MAX_NUM_REGISTERS * BLIS_SIMD_MAX_SIZE * 2\n\n This commit increases BLIS_SIMD_MAX_SIZE to 128 for the 'arm64'\n configuration, thus overriding the default and (hopefully) avoiding\n the aforementioned consistency check failures.\n- Appended '|| cat ./output.testsuite' to all 'make' commands in\n travis/do_testsuite.sh. Thanks to RuQing Xu for this suggestion.\n- Whitespace changes.\n- (cherry picked from 0b421eff130b5c896edcc09e7358d18564d177e9)\n\nRedirect grep stderr to /dev/null. (#723)\n\nDetails:\n- In common.mk, added a redirection of stderr to /dev/null for the grep\n command being used to gather a list of header files #included from\n bli_cntx_ref.c. The redirection is desirable because as of grep 3.8,\n regular expressions with \"stray\" backslashes trigger warnings [1].\n But removing the backslash seems to break the BLIS build system when\n using pre-3.8 versions of grep, so this seems to be easiest way to\n satisfy the BLIS build system for both pre- and post-3.8 grep\n environments.\n\n [1] https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html\n- (cherry picked from b1d3fc7e5b0927086e336a23f16ea59aa3611ccb)\n\nAdded runtime selection of 'power' config family. (#718)\n\nDetails:\n- Created a 'power' umbrella configuration family, which, when targeted\n at configure-time, will build both 'power9' and 'power10' subconfigs.\n (With this feature, a BLIS shared library could be compiled on a\n power9 system and run on power10 and vice-versa. Unoptimised code\n will execute if it is linked and run on any other generic system.)\n- This new configuration family will only work with gcc, since that is\n the only compiler supported by both power9 and power10 subconfigs in\n BLIS.\n- Documented power9 and power10 as supported microarchitectures in the\n docs/HardwareSupport.md document.\n- (cherry picked from e3d352f1fcc93e6a46fde1aa4a7f0a18fb27bd42)\n\nDefine `BLIS_VERSION_STRING` in `blis.h`. (#720)\n\nDetails:\n- Previously, the version string was communicated from configure to\n config.mk (via the config.mk.in template), where it was included via\n the top-level Makefile, where it was then used to define the\n preprocessor macro BLIS_VERSION_STRING via a command line argument to\n the compiler (via -D). This macro is then used within bli_info.c to\n initialize a static string which can then be queried via the\n bli_info_get_version_str() function. However, there are some\n applications that may find utility in being able to access the version\n string by inspecting the monolithic (flattened) blis.h header file\n that is created at compile time and installed alongside the library.\n This commit moves the definition of BLIS_VERSION_STRING into\n bli_config.h (via the bli_config.h.in template) so that it is\n embedded in blis.h. The version string is now available in three\n places:\n - the static/shared library, which is installed in the 'lib'\n subdirectory of the install prefix (query-able via the\n bli_info_get_version_str() function);\n - the config.mk makefile fragment, which is installed in the 'share'\n subdirectory of the install prefix (in the VERSION variable);\n - the blis.h header file, which is installed in the 'include'\n subdirectory of the install prefix (via the BLIS_VERSION_STRING\n macro constant).\n Thanks to Mohsen Aznaveh and Tim Davis for providing the idea for this\n change.\n- CREDITS file update.\n- (cherry picked from e730c685d09336b3bd09e86c94330c4eba967f3e)\n\nTypecast printf() args to avoid compiler warnings. (#716)\n\nDetails:\n- In bli_thread_range_tlb.c, typecast integer arguments passed to\n printf() -- which are typically disabled unless debugging -- to type\n \"long\" to guarantee a match to the \"%ld\" format specifiers used in\n those calls. This avoids spurious warnings with certain compilers in\n certain toolchain environments, such as 32-bit RISC-V (rv32iv).\n- (cherry picked from dc5d00a6ce0350cd82859d8c24f23d98f205d8db)\n\nUse here-document for 'configure --help' output. (#714)\n\nDetails:\n- Changed the configure script function that outputs \"--help\" text to do\n so via so-called \"here-document\" syntax for improved readability and\n maintainability. The change eliminates hundreds of echo statements and\n makes it easier to change existing configure options' help text, along\n with other benefits such as eliminating the need to escape double-\n quote characters (\").\n- (cherry picked from ecbcf4008815035c695822fcaf106477debff89a)\n\nMerge tlb- and slab/rr-specific gemm macrokernels. (#711)\n\nDetails:\n- Merged the tlb-specific gemm macrokernel (_var2b) with the slab/rr-\n specific one (var2) so that a single function can be compiled with\n either tlb or slab/rr support, depending on the value of the\n BLIS_ENABLE_JRIR_TLB, _SLAB, and _RR. This is done by incorporating\n information from both approaches: the start/end/inc for the JR and IR\n loops from slab or rr partitioning; and the number of assigned\n microtiles, plus the starting IR dimension offset for all iterations\n after the first (ir_next). With these changes, slab, rr, and tlb can\n all be parameterized by initializing a similar set of variables prior\n to the jr loop.\n- Removed the wrap-around logic that sets the \"b_next\" field of the\n auxinfo_t struct, which executes during the last IR iteration of the\n last JR iteration. The potential benefit of this code is so minor\n (and hinges on the microkernel making use of the b_next field) that\n it's arguably not worth including. The code also does the wrong\n thing for some threads whenever JR_NT > 1, since only thread 0 (in the\n JR group) would even compute with the first micropanel of B.\n- Re-expressed the definition of bli_is_last_iter_slrr so that slab and\n tlb use the same code rather than rr and tlb.\n- Adjusted the initialization of the gemm control tree accordingly.\n- (cherry picked from c334ec278f5e2a101625629b2e13bbf1b38dede5)\n\nFixed mis-mapped instruction for VEXTRACTF64X2. (#713)\n\nDetails:\n- This commit fixes a typo in the macro definition for the extended\n inline assembly macro VEXTRACTF64X2 in bli_x86_asm_macros.h. The macro\n was previously defined (incorrectly) in terms of the vextractf64x4\n instruction rather than vextractf64x2.\n- CREDITS file update.\n- (cherry picked from 5793a77937aee9847a5692c8e44b36a6380800a1)\n\nDefined lt, lte, gt, gte + misc. other updates. (#712)\n\nDetails:\n- Changed invertsc operation to be a non-destructive operation; that is,\n it now takes separate input and output operands. This change applies\n to both the object and typed APIs.\n- Defined an alternative square root operation, sqrtrsc, which, when\n operating on complex scalars, assumes the imaginary part of the input\n to be zero.\n- Changed the semantics of addm, subm, copym, axpym, scal2m, and xpbym\n so that when the source matrix has an implicit unit diagonal, the\n operation leaves the diagonal of the destination matrix untouched.\n Previously, the operations would interpret an implicit unit diagonal\n on the source matrix as a request to manifest the unit diagonal\n *explicitly* on output (either as something to copy in the case of\n copym, or something to compute with in the cases of addm, subm, axpym,\n scal2m, and xpbym). It turns out that this behavior was too cute by\n half and could cause unintended headaches for practical use cases.\n (This change in behavior also required small modifications to the trmv\n and trsv testsuite modules so that they would properly test matrices\n with unit diagonals.)\n- Added missing dependencies for copym to gemv, ger, hemv, trmv, and\n trsv testsuite modules.\n- Implemented level-0-like ltsc, ltesc, gtsc, gtesc operations in\n frame/util, which use lt, lte, gt, and gte level-0 scalar macros.\n- Trivial variable rename in bli_part.c to harmonize with other\n variable naming conventions.\n- (cherry picked from 16d2e9ea9ca0853197b416eba701b840a8587bca)\n\nImplement cntx_t pointer caching in gks. (#709)\n\nDetails:\n- Refactored the gks cntx_t query functions so that: (1) there is a\n clearer pattern of similarity between functions that query a native\n context and those that query its induced (1m) counterpart; and (2)\n queried cntx_t pointers (for both native and induced cntx_t pointers)\n are cached (by default), or deep-queried upon each invocation,\n depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is defined.\n- Refactored query-related functions in bli_arch.c to cache the queried\n arch_t value (by default), or deep-query the arch_t value upon each\n invocation, depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is\n defined.\n- Tweaked the behavior of bli_gks_query_ind_cntx_impl() (formerly named\n bli_gks_query_ind_cntx()) so that the induced method cntx_t struct is\n repopulated each time the function is called. (It is still only\n allocated once on first call.) This was mostly done in preparation for\n some future in which the arch_t value might change at runtime. In such\n a scenario, the induced method context would need to be recalculated\n any time the native context changes.\n- Added preprocessor logic to bli_config_macro_defs.h to handle enabling\n or disabling of cntx_t pointer caching (via BLIS_ENABLE_GKS_CACHING).\n- For now, cntx_t pointer caching is enabled by default and does not\n correspond to any official configure option. Disabling can be done\n by inserting a #define for BLIS_DISABLE_GKS_CACHING into the\n appropriate bli_family_*.h header file within the configuration of\n interest.\n- Thanks to Harihara Sudhan S (AMD) for suggesting that cntxt_t pointers\n (and not just arch_t values) be cached.\n- Comment updates.\n- (cherry picked from 9a366b14fe52c469f4664ef5dd93d85be8d97baa)\n\nFixing type-mismatch errors in power10 sandbox (#701)\n\nDetails:\n- This commit fixes a mismatch between the function type signature of\n bli_gemm_ex() required by BLIS and the version of the function defined\n within the power10 sandbox. It also performs typecasting upon calling\n bli_gemm_front() to attain type consistency with the type signature\n defined by BLIS for bli_gemm_front().\n- (cherry picked from b895ec9f1f66fb93972589c06bff171337153a31)\n\nDefine new global scalar (obj_t) constants. (#703)\n\nDetails:\n- This commit defines the following new global scalar constants:\n - BLIS_ONE_I: This constant encodes the imaginary unit.\n - BLIS_MINUS_ONE_I: This constant encodes the negative imaginary unit.\n - BLIS_NAN: This constant encodes a not-a-number value. Both real and\n imaginary parts are set to NaN for complex datatypes.\n- (cherry picked from 38d88d5c131253066cad4f98eea06fa9299cae3b)\n\nDisable power10 kernels other than sgemm, dgemm. (#705)\n\nDetails:\n- There is a power10 sandbox which uses microkernels for datatypes other\n than float and double (or scomplex/dcomplex). In a regular power10-\n configured build (that is, with the sandbox disabled), there were\n compile errors for some of these other non-sgemm/non-dgemm\n microkernels. This commit protects those kernels with a new cpp macro\n guard (which is defined in sandbox/power10/bli_sandbox.h) that\n prevents that kernel code from being compiled for normal, non-sandbox\n power10 builds.\n- (cherry picked from cdb22b8ffa5b31a0c16ac1a7bcecefeb5216f669)\n\nFix k = 0 edge case in power10 microkernels (#706)\n\nDetails:\n- When power10 sgemm and dgemm microkernels are called with k = 0, they\n become caught in infinite loops and segfault. This is fixed now via an\n early exit in the case of k = 0.\n- (cherry picked from d220f9c436c0dae409974724d42ab6c52f12a726)\n\nFixed clang compiler warning in bli_l0_ft.h.\n\nDetails:\n- Fixed a type redefinition in frame/0/bli_l0_ft.h that unintentionally\n slipped in with commit 02b5acd6f.","shortMessageHtmlLink":"Restored ArmSVE general storage case. (#708)"}},{"before":"2307a4be4555ff1192f908e047402a09092371ba","after":null,"ref":"refs/heads/stable","pushedAt":"2024-05-21T22:55:13.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":null,"after":"950d3099075e97a811806aae19b8abee501167b8","ref":"refs/heads/stable-feb19-cand1","pushedAt":"2024-05-21T22:53:50.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Restored ArmSVE general storage case. (#708)\n\nDetails:\n- Restored general storage case in armsve kernels.\n- Reason for doing this: Though real `g`-storage is difficult to\n speedup, `g`-codepath here can provide a good support for\n transposed-storage. i.e. at least good for `GEMM_UKR_SETUP_CT_AMBI`.\n- By experience, this solution is only *a little* slower than in-reg\n transpose. Plus in-reg transpose is only possible for a fixed VL in\n our case.\n- (cherry picked from 4e18cd34f909c5045597f411340ede3a5e0bc5e1)\n\nRefined emacs handling of indentation. (#717)\n\nDetails:\n- This refines the emacs autoformatting to be better in line with\n contribution guidelines.\n- Removed a stray shebang in a .mk file which confuses emacs about the\n file mode, which should be makefile-mode. (emacs also removes stray\n whitespace at the ends of lines.)\n- (cherry picked from 0ba6e9eafb1e667373d9dbc2aa045557921f33e2)\n\nUpdated hpx namespace for make_count_shape. (#725)\n\nDetails:\n- The hpx namespace for *counting_shape changed. This PR updates the use\n of counting_shape in blis to comply with the change in hpx.\n- Co-authored-by: ctaylor \n- (cherry picked from 059f15105b1643fe56084f883c22b3cadf368b39)\n\nAdded an 'arm64' entry to `.travis.yml`. (#726)\n\nDetails:\n- Added a new 'arm64' entry to the .travis.yml file in an attempt to get\n Travis CI to compile both NEON and SVE kernels, even if only NEON\n kernels are exercised in the testing. With this new 'arm64' entry, the\n 'cortexa57' entry becomes redundant and may be removed. Thanks to\n RuQing Xu for this suggestion.\n- Previously, the macro BLIS_SIMD_MAX_SIZE was *not* being set in\n bli_kernels_arm64.h, which meant that the default value of 64 was\n being used. This caused a runtime consistency check to fail in\n bli_gks.c (in Travis CI), one which requires that\n\n mr * nr * dt_size > BLIS_STACK_BUF_MAX_SIZE\n\n for all datatype sizes dt_size, where BLIS_STACK_BUF_MAX_SIZE is\n defined as\n\n BLIS_SIMD_MAX_NUM_REGISTERS * BLIS_SIMD_MAX_SIZE * 2\n\n This commit increases BLIS_SIMD_MAX_SIZE to 128 for the 'arm64'\n configuration, thus overriding the default and (hopefully) avoiding\n the aforementioned consistency check failures.\n- Appended '|| cat ./output.testsuite' to all 'make' commands in\n travis/do_testsuite.sh. Thanks to RuQing Xu for this suggestion.\n- Whitespace changes.\n- (cherry picked from 0b421eff130b5c896edcc09e7358d18564d177e9)\n\nRedirect grep stderr to /dev/null. (#723)\n\nDetails:\n- In common.mk, added a redirection of stderr to /dev/null for the grep\n command being used to gather a list of header files #included from\n bli_cntx_ref.c. The redirection is desirable because as of grep 3.8,\n regular expressions with \"stray\" backslashes trigger warnings [1].\n But removing the backslash seems to break the BLIS build system when\n using pre-3.8 versions of grep, so this seems to be easiest way to\n satisfy the BLIS build system for both pre- and post-3.8 grep\n environments.\n\n [1] https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html\n- (cherry picked from b1d3fc7e5b0927086e336a23f16ea59aa3611ccb)\n\nAdded runtime selection of 'power' config family. (#718)\n\nDetails:\n- Created a 'power' umbrella configuration family, which, when targeted\n at configure-time, will build both 'power9' and 'power10' subconfigs.\n (With this feature, a BLIS shared library could be compiled on a\n power9 system and run on power10 and vice-versa. Unoptimised code\n will execute if it is linked and run on any other generic system.)\n- This new configuration family will only work with gcc, since that is\n the only compiler supported by both power9 and power10 subconfigs in\n BLIS.\n- Documented power9 and power10 as supported microarchitectures in the\n docs/HardwareSupport.md document.\n- (cherry picked from e3d352f1fcc93e6a46fde1aa4a7f0a18fb27bd42)\n\nDefine `BLIS_VERSION_STRING` in `blis.h`. (#720)\n\nDetails:\n- Previously, the version string was communicated from configure to\n config.mk (via the config.mk.in template), where it was included via\n the top-level Makefile, where it was then used to define the\n preprocessor macro BLIS_VERSION_STRING via a command line argument to\n the compiler (via -D). This macro is then used within bli_info.c to\n initialize a static string which can then be queried via the\n bli_info_get_version_str() function. However, there are some\n applications that may find utility in being able to access the version\n string by inspecting the monolithic (flattened) blis.h header file\n that is created at compile time and installed alongside the library.\n This commit moves the definition of BLIS_VERSION_STRING into\n bli_config.h (via the bli_config.h.in template) so that it is\n embedded in blis.h. The version string is now available in three\n places:\n - the static/shared library, which is installed in the 'lib'\n subdirectory of the install prefix (query-able via the\n bli_info_get_version_str() function);\n - the config.mk makefile fragment, which is installed in the 'share'\n subdirectory of the install prefix (in the VERSION variable);\n - the blis.h header file, which is installed in the 'include'\n subdirectory of the install prefix (via the BLIS_VERSION_STRING\n macro constant).\n Thanks to Mohsen Aznaveh and Tim Davis for providing the idea for this\n change.\n- CREDITS file update.\n- (cherry picked from e730c685d09336b3bd09e86c94330c4eba967f3e)\n\nTypecast printf() args to avoid compiler warnings. (#716)\n\nDetails:\n- In bli_thread_range_tlb.c, typecast integer arguments passed to\n printf() -- which are typically disabled unless debugging -- to type\n \"long\" to guarantee a match to the \"%ld\" format specifiers used in\n those calls. This avoids spurious warnings with certain compilers in\n certain toolchain environments, such as 32-bit RISC-V (rv32iv).\n- (cherry picked from dc5d00a6ce0350cd82859d8c24f23d98f205d8db)\n\nUse here-document for 'configure --help' output. (#714)\n\nDetails:\n- Changed the configure script function that outputs \"--help\" text to do\n so via so-called \"here-document\" syntax for improved readability and\n maintainability. The change eliminates hundreds of echo statements and\n makes it easier to change existing configure options' help text, along\n with other benefits such as eliminating the need to escape double-\n quote characters (\").\n- (cherry picked from ecbcf4008815035c695822fcaf106477debff89a)\n\nMerge tlb- and slab/rr-specific gemm macrokernels. (#711)\n\nDetails:\n- Merged the tlb-specific gemm macrokernel (_var2b) with the slab/rr-\n specific one (var2) so that a single function can be compiled with\n either tlb or slab/rr support, depending on the value of the\n BLIS_ENABLE_JRIR_TLB, _SLAB, and _RR. This is done by incorporating\n information from both approaches: the start/end/inc for the JR and IR\n loops from slab or rr partitioning; and the number of assigned\n microtiles, plus the starting IR dimension offset for all iterations\n after the first (ir_next). With these changes, slab, rr, and tlb can\n all be parameterized by initializing a similar set of variables prior\n to the jr loop.\n- Removed the wrap-around logic that sets the \"b_next\" field of the\n auxinfo_t struct, which executes during the last IR iteration of the\n last JR iteration. The potential benefit of this code is so minor\n (and hinges on the microkernel making use of the b_next field) that\n it's arguably not worth including. The code also does the wrong\n thing for some threads whenever JR_NT > 1, since only thread 0 (in the\n JR group) would even compute with the first micropanel of B.\n- Re-expressed the definition of bli_is_last_iter_slrr so that slab and\n tlb use the same code rather than rr and tlb.\n- Adjusted the initialization of the gemm control tree accordingly.\n- (cherry picked from c334ec278f5e2a101625629b2e13bbf1b38dede5)\n\nFixed mis-mapped instruction for VEXTRACTF64X2. (#713)\n\nDetails:\n- This commit fixes a typo in the macro definition for the extended\n inline assembly macro VEXTRACTF64X2 in bli_x86_asm_macros.h. The macro\n was previously defined (incorrectly) in terms of the vextractf64x4\n instruction rather than vextractf64x2.\n- CREDITS file update.\n- (cherry picked from 5793a77937aee9847a5692c8e44b36a6380800a1)\n\nDefined lt, lte, gt, gte + misc. other updates. (#712)\n\nDetails:\n- Changed invertsc operation to be a non-destructive operation; that is,\n it now takes separate input and output operands. This change applies\n to both the object and typed APIs.\n- Defined an alternative square root operation, sqrtrsc, which, when\n operating on complex scalars, assumes the imaginary part of the input\n to be zero.\n- Changed the semantics of addm, subm, copym, axpym, scal2m, and xpbym\n so that when the source matrix has an implicit unit diagonal, the\n operation leaves the diagonal of the destination matrix untouched.\n Previously, the operations would interpret an implicit unit diagonal\n on the source matrix as a request to manifest the unit diagonal\n *explicitly* on output (either as something to copy in the case of\n copym, or something to compute with in the cases of addm, subm, axpym,\n scal2m, and xpbym). It turns out that this behavior was too cute by\n half and could cause unintended headaches for practical use cases.\n (This change in behavior also required small modifications to the trmv\n and trsv testsuite modules so that they would properly test matrices\n with unit diagonals.)\n- Added missing dependencies for copym to gemv, ger, hemv, trmv, and\n trsv testsuite modules.\n- Implemented level-0-like ltsc, ltesc, gtsc, gtesc operations in\n frame/util, which use lt, lte, gt, and gte level-0 scalar macros.\n- Trivial variable rename in bli_part.c to harmonize with other\n variable naming conventions.\n- (cherry picked from 16d2e9ea9ca0853197b416eba701b840a8587bca)\n\nImplement cntx_t pointer caching in gks. (#709)\n\nDetails:\n- Refactored the gks cntx_t query functions so that: (1) there is a\n clearer pattern of similarity between functions that query a native\n context and those that query its induced (1m) counterpart; and (2)\n queried cntx_t pointers (for both native and induced cntx_t pointers)\n are cached (by default), or deep-queried upon each invocation,\n depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is defined.\n- Refactored query-related functions in bli_arch.c to cache the queried\n arch_t value (by default), or deep-query the arch_t value upon each\n invocation, depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is\n defined.\n- Tweaked the behavior of bli_gks_query_ind_cntx_impl() (formerly named\n bli_gks_query_ind_cntx()) so that the induced method cntx_t struct is\n repopulated each time the function is called. (It is still only\n allocated once on first call.) This was mostly done in preparation for\n some future in which the arch_t value might change at runtime. In such\n a scenario, the induced method context would need to be recalculated\n any time the native context changes.\n- Added preprocessor logic to bli_config_macro_defs.h to handle enabling\n or disabling of cntx_t pointer caching (via BLIS_ENABLE_GKS_CACHING).\n- For now, cntx_t pointer caching is enabled by default and does not\n correspond to any official configure option. Disabling can be done\n by inserting a #define for BLIS_DISABLE_GKS_CACHING into the\n appropriate bli_family_*.h header file within the configuration of\n interest.\n- Thanks to Harihara Sudhan S (AMD) for suggesting that cntxt_t pointers\n (and not just arch_t values) be cached.\n- Comment updates.\n- (cherry picked from 9a366b14fe52c469f4664ef5dd93d85be8d97baa)\n\nFixing type-mismatch errors in power10 sandbox (#701)\n\nDetails:\n- This commit fixes a mismatch between the function type signature of\n bli_gemm_ex() required by BLIS and the version of the function defined\n within the power10 sandbox. It also performs typecasting upon calling\n bli_gemm_front() to attain type consistency with the type signature\n defined by BLIS for bli_gemm_front().\n- (cherry picked from b895ec9f1f66fb93972589c06bff171337153a31)\n\nDefine new global scalar (obj_t) constants. (#703)\n\nDetails:\n- This commit defines the following new global scalar constants:\n - BLIS_ONE_I: This constant encodes the imaginary unit.\n - BLIS_MINUS_ONE_I: This constant encodes the negative imaginary unit.\n - BLIS_NAN: This constant encodes a not-a-number value. Both real and\n imaginary parts are set to NaN for complex datatypes.\n- (cherry picked from 38d88d5c131253066cad4f98eea06fa9299cae3b)\n\nDisable power10 kernels other than sgemm, dgemm. (#705)\n\nDetails:\n- There is a power10 sandbox which uses microkernels for datatypes other\n than float and double (or scomplex/dcomplex). In a regular power10-\n configured build (that is, with the sandbox disabled), there were\n compile errors for some of these other non-sgemm/non-dgemm\n microkernels. This commit protects those kernels with a new cpp macro\n guard (which is defined in sandbox/power10/bli_sandbox.h) that\n prevents that kernel code from being compiled for normal, non-sandbox\n power10 builds.\n- (cherry picked from cdb22b8ffa5b31a0c16ac1a7bcecefeb5216f669)\n\nFix k = 0 edge case in power10 microkernels (#706)\n\nDetails:\n- When power10 sgemm and dgemm microkernels are called with k = 0, they\n become caught in infinite loops and segfault. This is fixed now via an\n early exit in the case of k = 0.\n- (cherry picked from d220f9c436c0dae409974724d42ab6c52f12a726)\n\nFixed clang compiler warning in bli_l0_ft.h.\n\nDetails:\n- Fixed a type redefinition in frame/0/bli_l0_ft.h that unintentionally\n slipped in with commit 02b5acd6f.","shortMessageHtmlLink":"Restored ArmSVE general storage case. (#708)"}},{"before":null,"after":"9730d6cc0a667723d112371f3f3d6b00af216490","ref":"refs/heads/stable-mar24-cand0","pushedAt":"2024-05-21T22:25:59.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Omit -fPIC if shared library build is disabled. (#732)\n\nDetails:\n- Updated common.mk so that when --disable-shared option is given to\n configure:\n 1. The -fPIC compiler flag is omitted from the individual\n configuration family members' CPICFLAGS variables (which are\n initialized in each subconfig's make_defs.mk file); and\n 2. The BUILD_SYMFLAGS variable, which contains compiler flags needed\n to control the symbol export behavior, is left blank.\n- The net result of these changes is that flags specific to shared\n library builds are only used when a shared library is actually\n scheduled to be built. Thanks to Nick Knight for reporting this issue.\n- CREDITS file update.\n- (cherry picked from commit 5f841307f668f65b7ed5a479bd8374d2581208cf)\n\nUpdated configure to pass all shellcheck checks. (#729)\n\nDetails:\n- Modified configure so that it passes all 'shellcheck' checks,\n disabling ones which we violate but which are just stylistic, or are\n special cases in our code.\n- Miscellaneous other minor changes, such as rearranged redirections in\n long sed/perl pipes to look more natural.\n- Whitespace tweaks.\n- (cherry picked from 72c37eb80f964b7840377076e5009aec5b29d320)\n\nFixed bugs in scal2v ref kernel when alpha == 1. (#728)\n\nDetails:\n- Fixed a typo bug in ref_kernels/1/bli_scal2v_ref.c where the\n conditional that was supposed to be checking for cases when alpha is\n equal to 1.0 (so that copyv could be used instead of scal2v) was\n instead erroneously comparing alpha against 0.0.\n- Fixed another bug in the same function whereby BLIS_NO_CONJUGATE was\n erroneously being passed into copyv instead of the kernel's conjx\n parameter. This second bug was inert, however, due to the first bug\n since the \"alpha == 0.0\" case was already being handled, resulting in\n the code block never executing.\n- (cherry picked from 60f36347c16e6336215cd52b4e5f3c0f96e7c253)\n\nUse 'void*' datatypes in kernel APIs. (#727)\n\nDetails:\n- Migrated all kernel APIs to use void* pointers instead of float*,\n double*, scomplex*, and dcomplex* pointers. This allows us to define\n many fewer kernel function pointer types, which also makes it much\n easier to know which function pointer type to use at any given time.\n (For example, whereas before there was ?axpyv_ker_ft, ?axpyv_ker_vft,\n and axpyv_ker_vft, now there is just axpyv_ker_ft, which is equivalent\n so what axpyv_ker_vft used to be.)\n- Refactored how kernel function prototypes and kernel function types\n are defined so as to reduce redundant code. Specifically, the\n function signatures (excluding cntx_t* and, in the case of level-3\n microkernels, auxinfo_t*) are defined in new headers named, for\n example, bli_l1v_ker_params.h. Those signatures are reused via macro\n instantiation when defining both kernel prototypes and kernel function\n types. This will hopefully make it a little easier to update, add, and\n manage kernel APIs going forward.\n- Updated all reference kernels according to the aforementioned switch\n to void* pointers.\n- Updated all optimzied kernels according to the aforementioned switch\n to void* pointers. This sometimes required renaming variables,\n inserting typecasting so that pointer arithmetic could continue to\n function as intended, and related tweaks.\n- Updated sandbox/gemmlike according to the aforementioned switch to\n void* pointers.\n- Renamed:\n - frame/1/bli_l1v_ft_ker.h -> frame/1/bli_l1v_ker_ft.h\n - frame/1f/bli_l1f_ft_ker.h -> frame/1f/bli_l1f_ker_ft.h\n - frame/1m/bli_l1m_ft_ker.h -> frame/1m/bli_l1m_ker_ft.h\n - frame/3/bli_l1m_ft_ukr.h -> frame/3/bli_l1m_ukr_ft.h\n - frame/3/bli_l3_sup_ft_ker.h -> frame/3/bli_l3_sup_ker_ft.h\n to better align with naming of neighboring files.\n- Added the missing \"void* params\" argument to bli_?packm_struc_cxk() in\n frame/1m/packm/bli_packm_struc_cxk.c. This argument is being passed\n into the function from bli_packm_blk_var1(), but wasn't being \"caught\"\n by the function definition itself. The function prototype for\n bli_?packm_struc_cxk() also needed updating.\n- Reordered the last two parameters in bli_?packm_struc_cxk().\n (Previously, the \"void* params\" was passed in after the\n \"const cntx_t* cntx\", although because of the above bug the params\n argument wasn't actually present in the function definition.)\n- (cherry picked from fab18dca46618799bb0b4f652820b33d36a5d4d4)\n\nUse 'const' pointers in kernel APIs. (#722)\n\nDetails:\n- Qualified all input-only data pointers in the various kernel APIs with\n the 'const' keyword while also removing 'restrict' from those kernel\n APIs. (Use of 'restrict' was maintained in kernel implementations,\n where appropriate.) This affected the function pointer types defined\n for all of the kernels, their prototypes, and the reference and\n optimized kernel definitions' signatures.\n- Templatized the definitions of copys_mxn and xpbys_mxn static inline\n functions.\n- Minor whitespace and style changes (e.g. combining local variable\n declaration and initialization into a single statement).\n- Removed some unused kernel code left in 'old' directories.\n- Thanks to Nisanth M P for helping to validate changes to the power10\n microkernels.\n- (cherry picked from 93c63d1f469c4650df082d0fa2f29c46db0e25f5)","shortMessageHtmlLink":"Omit -fPIC if shared library build is disabled. (#732)"}},{"before":"a45f3da2579585b64e1e67a03ac4b9faf957b54c","after":"2307a4be4555ff1192f908e047402a09092371ba","ref":"refs/heads/stable","pushedAt":"2024-05-21T21:32:06.000Z","pushType":"push","commitsCount":17,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Restored ArmSVE general storage case. (#708)\n\nDetails:\n- Restored general storage case in armsve kernels.\n- Reason for doing this: Though real `g`-storage is difficult to\n speedup, `g`-codepath here can provide a good support for\n transposed-storage. i.e. at least good for `GEMM_UKR_SETUP_CT_AMBI`.\n- By experience, this solution is only *a little* slower than in-reg\n transpose. Plus in-reg transpose is only possible for a fixed VL in\n our case.\n- (cherry picked from commit 4e18cd34f909c5045597f411340ede3a5e0bc5e1)\n\nRefined emacs handling of indentation. (#717)\n\nDetails:\n- This refines the emacs autoformatting to be better in line with\n contribution guidelines.\n- Removed a stray shebang in a .mk file which confuses emacs about the\n file mode, which should be makefile-mode. (emacs also removes stray\n whitespace at the ends of lines.)\n- (cherry picked from 0ba6e9eafb1e667373d9dbc2aa045557921f33e2)\n\nUpdated hpx namespace for make_count_shape. (#725)\n\nDetails:\n- The hpx namespace for *counting_shape changed. This PR updates the use\n of counting_shape in blis to comply with the change in hpx.\n- Co-authored-by: ctaylor \n- (cherry picked from 059f15105b1643fe56084f883c22b3cadf368b39)\n\nAdded an 'arm64' entry to `.travis.yml`. (#726)\n\nDetails:\n- Added a new 'arm64' entry to the .travis.yml file in an attempt to get\n Travis CI to compile both NEON and SVE kernels, even if only NEON\n kernels are exercised in the testing. With this new 'arm64' entry, the\n 'cortexa57' entry becomes redundant and may be removed. Thanks to\n RuQing Xu for this suggestion.\n- Previously, the macro BLIS_SIMD_MAX_SIZE was *not* being set in\n bli_kernels_arm64.h, which meant that the default value of 64 was\n being used. This caused a runtime consistency check to fail in\n bli_gks.c (in Travis CI), one which requires that\n\n mr * nr * dt_size > BLIS_STACK_BUF_MAX_SIZE\n\n for all datatype sizes dt_size, where BLIS_STACK_BUF_MAX_SIZE is\n defined as\n\n BLIS_SIMD_MAX_NUM_REGISTERS * BLIS_SIMD_MAX_SIZE * 2\n\n This commit increases BLIS_SIMD_MAX_SIZE to 128 for the 'arm64'\n configuration, thus overriding the default and (hopefully) avoiding\n the aforementioned consistency check failures.\n- Appended '|| cat ./output.testsuite' to all 'make' commands in\n travis/do_testsuite.sh. Thanks to RuQing Xu for this suggestion.\n- Whitespace changes.\n- (cherry picked from 0b421eff130b5c896edcc09e7358d18564d177e9)\n\nRedirect grep stderr to /dev/null. (#723)\n\nDetails:\n- In common.mk, added a redirection of stderr to /dev/null for the grep\n command being used to gather a list of header files #included from\n bli_cntx_ref.c. The redirection is desirable because as of grep 3.8,\n regular expressions with \"stray\" backslashes trigger warnings [1].\n But removing the backslash seems to break the BLIS build system when\n using pre-3.8 versions of grep, so this seems to be easiest way to\n satisfy the BLIS build system for both pre- and post-3.8 grep\n environments.\n\n [1] https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html\n- (cherry picked from b1d3fc7e5b0927086e336a23f16ea59aa3611ccb)\n\nAdded runtime selection of 'power' config family. (#718)\n\nDetails:\n- Created a 'power' umbrella configuration family, which, when targeted\n at configure-time, will build both 'power9' and 'power10' subconfigs.\n (With this feature, a BLIS shared library could be compiled on a\n power9 system and run on power10 and vice-versa. Unoptimised code\n will execute if it is linked and run on any other generic system.)\n- This new configuration family will only work with gcc, since that is\n the only compiler supported by both power9 and power10 subconfigs in\n BLIS.\n- Documented power9 and power10 as supported microarchitectures in the\n docs/HardwareSupport.md document.\n- (cherry picked from e3d352f1fcc93e6a46fde1aa4a7f0a18fb27bd42)\n\nDefine `BLIS_VERSION_STRING` in `blis.h`. (#720)\n\nDetails:\n- Previously, the version string was communicated from configure to\n config.mk (via the config.mk.in template), where it was included via\n the top-level Makefile, where it was then used to define the\n preprocessor macro BLIS_VERSION_STRING via a command line argument to\n the compiler (via -D). This macro is then used within bli_info.c to\n initialize a static string which can then be queried via the\n bli_info_get_version_str() function. However, there are some\n applications that may find utility in being able to access the version\n string by inspecting the monolithic (flattened) blis.h header file\n that is created at compile time and installed alongside the library.\n This commit moves the definition of BLIS_VERSION_STRING into\n bli_config.h (via the bli_config.h.in template) so that it is\n embedded in blis.h. The version string is now available in three\n places:\n - the static/shared library, which is installed in the 'lib'\n subdirectory of the install prefix (query-able via the\n bli_info_get_version_str() function);\n - the config.mk makefile fragment, which is installed in the 'share'\n subdirectory of the install prefix (in the VERSION variable);\n - the blis.h header file, which is installed in the 'include'\n subdirectory of the install prefix (via the BLIS_VERSION_STRING\n macro constant).\n Thanks to Mohsen Aznaveh and Tim Davis for providing the idea for this\n change.\n- CREDITS file update.\n- (cherry picked from e730c685d09336b3bd09e86c94330c4eba967f3e)\n\nTypecast printf() args to avoid compiler warnings. (#716)\n\nDetails:\n- In bli_thread_range_tlb.c, typecast integer arguments passed to\n printf() -- which are typically disabled unless debugging -- to type\n \"long\" to guarantee a match to the \"%ld\" format specifiers used in\n those calls. This avoids spurious warnings with certain compilers in\n certain toolchain environments, such as 32-bit RISC-V (rv32iv).\n- (cherry picked from dc5d00a6ce0350cd82859d8c24f23d98f205d8db)\n\nUse here-document for 'configure --help' output. (#714)\n\nDetails:\n- Changed the configure script function that outputs \"--help\" text to do\n so via so-called \"here-document\" syntax for improved readability and\n maintainability. The change eliminates hundreds of echo statements and\n makes it easier to change existing configure options' help text, along\n with other benefits such as eliminating the need to escape double-\n quote characters (\").\n- (cherry picked from ecbcf4008815035c695822fcaf106477debff89a)\n\nMerge tlb- and slab/rr-specific gemm macrokernels. (#711)\n\nDetails:\n- Merged the tlb-specific gemm macrokernel (_var2b) with the slab/rr-\n specific one (var2) so that a single function can be compiled with\n either tlb or slab/rr support, depending on the value of the\n BLIS_ENABLE_JRIR_TLB, _SLAB, and _RR. This is done by incorporating\n information from both approaches: the start/end/inc for the JR and IR\n loops from slab or rr partitioning; and the number of assigned\n microtiles, plus the starting IR dimension offset for all iterations\n after the first (ir_next). With these changes, slab, rr, and tlb can\n all be parameterized by initializing a similar set of variables prior\n to the jr loop.\n- Removed the wrap-around logic that sets the \"b_next\" field of the\n auxinfo_t struct, which executes during the last IR iteration of the\n last JR iteration. The potential benefit of this code is so minor\n (and hinges on the microkernel making use of the b_next field) that\n it's arguably not worth including. The code also does the wrong\n thing for some threads whenever JR_NT > 1, since only thread 0 (in the\n JR group) would even compute with the first micropanel of B.\n- Re-expressed the definition of bli_is_last_iter_slrr so that slab and\n tlb use the same code rather than rr and tlb.\n- Adjusted the initialization of the gemm control tree accordingly.\n- (cherry picked from c334ec278f5e2a101625629b2e13bbf1b38dede5)\n\nFixed mis-mapped instruction for VEXTRACTF64X2. (#713)\n\nDetails:\n- This commit fixes a typo in the macro definition for the extended\n inline assembly macro VEXTRACTF64X2 in bli_x86_asm_macros.h. The macro\n was previously defined (incorrectly) in terms of the vextractf64x4\n instruction rather than vextractf64x2.\n- CREDITS file update.\n- (cherry picked from 5793a77937aee9847a5692c8e44b36a6380800a1)\n\nDefined lt, lte, gt, gte + misc. other updates. (#712)\n\nDetails:\n- Changed invertsc operation to be a non-destructive operation; that is,\n it now takes separate input and output operands. This change applies\n to both the object and typed APIs.\n- Defined an alternative square root operation, sqrtrsc, which, when\n operating on complex scalars, assumes the imaginary part of the input\n to be zero.\n- Changed the semantics of addm, subm, copym, axpym, scal2m, and xpbym\n so that when the source matrix has an implicit unit diagonal, the\n operation leaves the diagonal of the destination matrix untouched.\n Previously, the operations would interpret an implicit unit diagonal\n on the source matrix as a request to manifest the unit diagonal\n *explicitly* on output (either as something to copy in the case of\n copym, or something to compute with in the cases of addm, subm, axpym,\n scal2m, and xpbym). It turns out that this behavior was too cute by\n half and could cause unintended headaches for practical use cases.\n (This change in behavior also required small modifications to the trmv\n and trsv testsuite modules so that they would properly test matrices\n with unit diagonals.)\n- Added missing dependencies for copym to gemv, ger, hemv, trmv, and\n trsv testsuite modules.\n- Implemented level-0-like ltsc, ltesc, gtsc, gtesc operations in\n frame/util, which use lt, lte, gt, and gte level-0 scalar macros.\n- Trivial variable rename in bli_part.c to harmonize with other\n variable naming conventions.\n- (cherry picked from 16d2e9ea9ca0853197b416eba701b840a8587bca)\n\nImplement cntx_t pointer caching in gks. (#709)\n\nDetails:\n- Refactored the gks cntx_t query functions so that: (1) there is a\n clearer pattern of similarity between functions that query a native\n context and those that query its induced (1m) counterpart; and (2)\n queried cntx_t pointers (for both native and induced cntx_t pointers)\n are cached (by default), or deep-queried upon each invocation,\n depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is defined.\n- Refactored query-related functions in bli_arch.c to cache the queried\n arch_t value (by default), or deep-query the arch_t value upon each\n invocation, depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is\n defined.\n- Tweaked the behavior of bli_gks_query_ind_cntx_impl() (formerly named\n bli_gks_query_ind_cntx()) so that the induced method cntx_t struct is\n repopulated each time the function is called. (It is still only\n allocated once on first call.) This was mostly done in preparation for\n some future in which the arch_t value might change at runtime. In such\n a scenario, the induced method context would need to be recalculated\n any time the native context changes.\n- Added preprocessor logic to bli_config_macro_defs.h to handle enabling\n or disabling of cntx_t pointer caching (via BLIS_ENABLE_GKS_CACHING).\n- For now, cntx_t pointer caching is enabled by default and does not\n correspond to any official configure option. Disabling can be done\n by inserting a #define for BLIS_DISABLE_GKS_CACHING into the\n appropriate bli_family_*.h header file within the configuration of\n interest.\n- Thanks to Harihara Sudhan S (AMD) for suggesting that cntxt_t pointers\n (and not just arch_t values) be cached.\n- Comment updates.\n- (cherry picked from 9a366b14fe52c469f4664ef5dd93d85be8d97baa)\n\nFixing type-mismatch errors in power10 sandbox (#701)\n\nDetails:\n- This commit fixes a mismatch between the function type signature of\n bli_gemm_ex() required by BLIS and the version of the function defined\n within the power10 sandbox. It also performs typecasting upon calling\n bli_gemm_front() to attain type consistency with the type signature\n defined by BLIS for bli_gemm_front().\n- (cherry picked from b895ec9f1f66fb93972589c06bff171337153a31)\n\nDefine new global scalar (obj_t) constants. (#703)\n\nDetails:\n- This commit defines the following new global scalar constants:\n - BLIS_ONE_I: This constant encodes the imaginary unit.\n - BLIS_MINUS_ONE_I: This constant encodes the negative imaginary unit.\n - BLIS_NAN: This constant encodes a not-a-number value. Both real and\n imaginary parts are set to NaN for complex datatypes.\n- (cherry picked from 38d88d5c131253066cad4f98eea06fa9299cae3b)\n\nDisable power10 kernels other than sgemm, dgemm. (#705)\n\nDetails:\n- There is a power10 sandbox which uses microkernels for datatypes other\n than float and double (or scomplex/dcomplex). In a regular power10-\n configured build (that is, with the sandbox disabled), there were\n compile errors for some of these other non-sgemm/non-dgemm\n microkernels. This commit protects those kernels with a new cpp macro\n guard (which is defined in sandbox/power10/bli_sandbox.h) that\n prevents that kernel code from being compiled for normal, non-sandbox\n power10 builds.\n- (cherry picked from cdb22b8ffa5b31a0c16ac1a7bcecefeb5216f669)\n\nFix k = 0 edge case in power10 microkernels (#706)\n\nDetails:\n- When power10 sgemm and dgemm microkernels are called with k = 0, they\n become caught in infinite loops and segfault. This is fixed now via an\n early exit in the case of k = 0.\n- (cherry picked from d220f9c436c0dae409974724d42ab6c52f12a726)","shortMessageHtmlLink":"Restored ArmSVE general storage case. (#708)"}},{"before":"2329d99016fe1aeb86da4552295f497543cea311","after":null,"ref":"refs/heads/1m_row_col_problem","pushedAt":"2024-05-21T21:20:33.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":"5112e1859e7f8888f5555eb7bc02bd9fab9b4442","after":null,"ref":"refs/heads/rt","pushedAt":"2024-05-21T21:18:12.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":"e90e7f309b3f2760a01e8e09a29bf702754fa2b5","after":null,"ref":"refs/heads/win-pthreads","pushedAt":"2024-05-21T21:15:05.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":"a32e8a47c022b6071302b2956af5728976c83ca9","after":null,"ref":"refs/heads/travis","pushedAt":"2024-05-21T21:06:04.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":null,"after":"2307a4be4555ff1192f908e047402a09092371ba","ref":"refs/heads/stable-feb19-cand0","pushedAt":"2024-05-20T20:26:23.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Restored ArmSVE general storage case. (#708)\n\nDetails:\n- Restored general storage case in armsve kernels.\n- Reason for doing this: Though real `g`-storage is difficult to\n speedup, `g`-codepath here can provide a good support for\n transposed-storage. i.e. at least good for `GEMM_UKR_SETUP_CT_AMBI`.\n- By experience, this solution is only *a little* slower than in-reg\n transpose. Plus in-reg transpose is only possible for a fixed VL in\n our case.\n- (cherry picked from commit 4e18cd34f909c5045597f411340ede3a5e0bc5e1)\n\nRefined emacs handling of indentation. (#717)\n\nDetails:\n- This refines the emacs autoformatting to be better in line with\n contribution guidelines.\n- Removed a stray shebang in a .mk file which confuses emacs about the\n file mode, which should be makefile-mode. (emacs also removes stray\n whitespace at the ends of lines.)\n- (cherry picked from 0ba6e9eafb1e667373d9dbc2aa045557921f33e2)\n\nUpdated hpx namespace for make_count_shape. (#725)\n\nDetails:\n- The hpx namespace for *counting_shape changed. This PR updates the use\n of counting_shape in blis to comply with the change in hpx.\n- Co-authored-by: ctaylor \n- (cherry picked from 059f15105b1643fe56084f883c22b3cadf368b39)\n\nAdded an 'arm64' entry to `.travis.yml`. (#726)\n\nDetails:\n- Added a new 'arm64' entry to the .travis.yml file in an attempt to get\n Travis CI to compile both NEON and SVE kernels, even if only NEON\n kernels are exercised in the testing. With this new 'arm64' entry, the\n 'cortexa57' entry becomes redundant and may be removed. Thanks to\n RuQing Xu for this suggestion.\n- Previously, the macro BLIS_SIMD_MAX_SIZE was *not* being set in\n bli_kernels_arm64.h, which meant that the default value of 64 was\n being used. This caused a runtime consistency check to fail in\n bli_gks.c (in Travis CI), one which requires that\n\n mr * nr * dt_size > BLIS_STACK_BUF_MAX_SIZE\n\n for all datatype sizes dt_size, where BLIS_STACK_BUF_MAX_SIZE is\n defined as\n\n BLIS_SIMD_MAX_NUM_REGISTERS * BLIS_SIMD_MAX_SIZE * 2\n\n This commit increases BLIS_SIMD_MAX_SIZE to 128 for the 'arm64'\n configuration, thus overriding the default and (hopefully) avoiding\n the aforementioned consistency check failures.\n- Appended '|| cat ./output.testsuite' to all 'make' commands in\n travis/do_testsuite.sh. Thanks to RuQing Xu for this suggestion.\n- Whitespace changes.\n- (cherry picked from 0b421eff130b5c896edcc09e7358d18564d177e9)\n\nRedirect grep stderr to /dev/null. (#723)\n\nDetails:\n- In common.mk, added a redirection of stderr to /dev/null for the grep\n command being used to gather a list of header files #included from\n bli_cntx_ref.c. The redirection is desirable because as of grep 3.8,\n regular expressions with \"stray\" backslashes trigger warnings [1].\n But removing the backslash seems to break the BLIS build system when\n using pre-3.8 versions of grep, so this seems to be easiest way to\n satisfy the BLIS build system for both pre- and post-3.8 grep\n environments.\n\n [1] https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html\n- (cherry picked from b1d3fc7e5b0927086e336a23f16ea59aa3611ccb)\n\nAdded runtime selection of 'power' config family. (#718)\n\nDetails:\n- Created a 'power' umbrella configuration family, which, when targeted\n at configure-time, will build both 'power9' and 'power10' subconfigs.\n (With this feature, a BLIS shared library could be compiled on a\n power9 system and run on power10 and vice-versa. Unoptimised code\n will execute if it is linked and run on any other generic system.)\n- This new configuration family will only work with gcc, since that is\n the only compiler supported by both power9 and power10 subconfigs in\n BLIS.\n- Documented power9 and power10 as supported microarchitectures in the\n docs/HardwareSupport.md document.\n- (cherry picked from e3d352f1fcc93e6a46fde1aa4a7f0a18fb27bd42)\n\nDefine `BLIS_VERSION_STRING` in `blis.h`. (#720)\n\nDetails:\n- Previously, the version string was communicated from configure to\n config.mk (via the config.mk.in template), where it was included via\n the top-level Makefile, where it was then used to define the\n preprocessor macro BLIS_VERSION_STRING via a command line argument to\n the compiler (via -D). This macro is then used within bli_info.c to\n initialize a static string which can then be queried via the\n bli_info_get_version_str() function. However, there are some\n applications that may find utility in being able to access the version\n string by inspecting the monolithic (flattened) blis.h header file\n that is created at compile time and installed alongside the library.\n This commit moves the definition of BLIS_VERSION_STRING into\n bli_config.h (via the bli_config.h.in template) so that it is\n embedded in blis.h. The version string is now available in three\n places:\n - the static/shared library, which is installed in the 'lib'\n subdirectory of the install prefix (query-able via the\n bli_info_get_version_str() function);\n - the config.mk makefile fragment, which is installed in the 'share'\n subdirectory of the install prefix (in the VERSION variable);\n - the blis.h header file, which is installed in the 'include'\n subdirectory of the install prefix (via the BLIS_VERSION_STRING\n macro constant).\n Thanks to Mohsen Aznaveh and Tim Davis for providing the idea for this\n change.\n- CREDITS file update.\n- (cherry picked from e730c685d09336b3bd09e86c94330c4eba967f3e)\n\nTypecast printf() args to avoid compiler warnings. (#716)\n\nDetails:\n- In bli_thread_range_tlb.c, typecast integer arguments passed to\n printf() -- which are typically disabled unless debugging -- to type\n \"long\" to guarantee a match to the \"%ld\" format specifiers used in\n those calls. This avoids spurious warnings with certain compilers in\n certain toolchain environments, such as 32-bit RISC-V (rv32iv).\n- (cherry picked from dc5d00a6ce0350cd82859d8c24f23d98f205d8db)\n\nUse here-document for 'configure --help' output. (#714)\n\nDetails:\n- Changed the configure script function that outputs \"--help\" text to do\n so via so-called \"here-document\" syntax for improved readability and\n maintainability. The change eliminates hundreds of echo statements and\n makes it easier to change existing configure options' help text, along\n with other benefits such as eliminating the need to escape double-\n quote characters (\").\n- (cherry picked from ecbcf4008815035c695822fcaf106477debff89a)\n\nMerge tlb- and slab/rr-specific gemm macrokernels. (#711)\n\nDetails:\n- Merged the tlb-specific gemm macrokernel (_var2b) with the slab/rr-\n specific one (var2) so that a single function can be compiled with\n either tlb or slab/rr support, depending on the value of the\n BLIS_ENABLE_JRIR_TLB, _SLAB, and _RR. This is done by incorporating\n information from both approaches: the start/end/inc for the JR and IR\n loops from slab or rr partitioning; and the number of assigned\n microtiles, plus the starting IR dimension offset for all iterations\n after the first (ir_next). With these changes, slab, rr, and tlb can\n all be parameterized by initializing a similar set of variables prior\n to the jr loop.\n- Removed the wrap-around logic that sets the \"b_next\" field of the\n auxinfo_t struct, which executes during the last IR iteration of the\n last JR iteration. The potential benefit of this code is so minor\n (and hinges on the microkernel making use of the b_next field) that\n it's arguably not worth including. The code also does the wrong\n thing for some threads whenever JR_NT > 1, since only thread 0 (in the\n JR group) would even compute with the first micropanel of B.\n- Re-expressed the definition of bli_is_last_iter_slrr so that slab and\n tlb use the same code rather than rr and tlb.\n- Adjusted the initialization of the gemm control tree accordingly.\n- (cherry picked from c334ec278f5e2a101625629b2e13bbf1b38dede5)\n\nFixed mis-mapped instruction for VEXTRACTF64X2. (#713)\n\nDetails:\n- This commit fixes a typo in the macro definition for the extended\n inline assembly macro VEXTRACTF64X2 in bli_x86_asm_macros.h. The macro\n was previously defined (incorrectly) in terms of the vextractf64x4\n instruction rather than vextractf64x2.\n- CREDITS file update.\n- (cherry picked from 5793a77937aee9847a5692c8e44b36a6380800a1)\n\nDefined lt, lte, gt, gte + misc. other updates. (#712)\n\nDetails:\n- Changed invertsc operation to be a non-destructive operation; that is,\n it now takes separate input and output operands. This change applies\n to both the object and typed APIs.\n- Defined an alternative square root operation, sqrtrsc, which, when\n operating on complex scalars, assumes the imaginary part of the input\n to be zero.\n- Changed the semantics of addm, subm, copym, axpym, scal2m, and xpbym\n so that when the source matrix has an implicit unit diagonal, the\n operation leaves the diagonal of the destination matrix untouched.\n Previously, the operations would interpret an implicit unit diagonal\n on the source matrix as a request to manifest the unit diagonal\n *explicitly* on output (either as something to copy in the case of\n copym, or something to compute with in the cases of addm, subm, axpym,\n scal2m, and xpbym). It turns out that this behavior was too cute by\n half and could cause unintended headaches for practical use cases.\n (This change in behavior also required small modifications to the trmv\n and trsv testsuite modules so that they would properly test matrices\n with unit diagonals.)\n- Added missing dependencies for copym to gemv, ger, hemv, trmv, and\n trsv testsuite modules.\n- Implemented level-0-like ltsc, ltesc, gtsc, gtesc operations in\n frame/util, which use lt, lte, gt, and gte level-0 scalar macros.\n- Trivial variable rename in bli_part.c to harmonize with other\n variable naming conventions.\n- (cherry picked from 16d2e9ea9ca0853197b416eba701b840a8587bca)\n\nImplement cntx_t pointer caching in gks. (#709)\n\nDetails:\n- Refactored the gks cntx_t query functions so that: (1) there is a\n clearer pattern of similarity between functions that query a native\n context and those that query its induced (1m) counterpart; and (2)\n queried cntx_t pointers (for both native and induced cntx_t pointers)\n are cached (by default), or deep-queried upon each invocation,\n depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is defined.\n- Refactored query-related functions in bli_arch.c to cache the queried\n arch_t value (by default), or deep-query the arch_t value upon each\n invocation, depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is\n defined.\n- Tweaked the behavior of bli_gks_query_ind_cntx_impl() (formerly named\n bli_gks_query_ind_cntx()) so that the induced method cntx_t struct is\n repopulated each time the function is called. (It is still only\n allocated once on first call.) This was mostly done in preparation for\n some future in which the arch_t value might change at runtime. In such\n a scenario, the induced method context would need to be recalculated\n any time the native context changes.\n- Added preprocessor logic to bli_config_macro_defs.h to handle enabling\n or disabling of cntx_t pointer caching (via BLIS_ENABLE_GKS_CACHING).\n- For now, cntx_t pointer caching is enabled by default and does not\n correspond to any official configure option. Disabling can be done\n by inserting a #define for BLIS_DISABLE_GKS_CACHING into the\n appropriate bli_family_*.h header file within the configuration of\n interest.\n- Thanks to Harihara Sudhan S (AMD) for suggesting that cntxt_t pointers\n (and not just arch_t values) be cached.\n- Comment updates.\n- (cherry picked from 9a366b14fe52c469f4664ef5dd93d85be8d97baa)\n\nFixing type-mismatch errors in power10 sandbox (#701)\n\nDetails:\n- This commit fixes a mismatch between the function type signature of\n bli_gemm_ex() required by BLIS and the version of the function defined\n within the power10 sandbox. It also performs typecasting upon calling\n bli_gemm_front() to attain type consistency with the type signature\n defined by BLIS for bli_gemm_front().\n- (cherry picked from b895ec9f1f66fb93972589c06bff171337153a31)\n\nDefine new global scalar (obj_t) constants. (#703)\n\nDetails:\n- This commit defines the following new global scalar constants:\n - BLIS_ONE_I: This constant encodes the imaginary unit.\n - BLIS_MINUS_ONE_I: This constant encodes the negative imaginary unit.\n - BLIS_NAN: This constant encodes a not-a-number value. Both real and\n imaginary parts are set to NaN for complex datatypes.\n- (cherry picked from 38d88d5c131253066cad4f98eea06fa9299cae3b)\n\nDisable power10 kernels other than sgemm, dgemm. (#705)\n\nDetails:\n- There is a power10 sandbox which uses microkernels for datatypes other\n than float and double (or scomplex/dcomplex). In a regular power10-\n configured build (that is, with the sandbox disabled), there were\n compile errors for some of these other non-sgemm/non-dgemm\n microkernels. This commit protects those kernels with a new cpp macro\n guard (which is defined in sandbox/power10/bli_sandbox.h) that\n prevents that kernel code from being compiled for normal, non-sandbox\n power10 builds.\n- (cherry picked from cdb22b8ffa5b31a0c16ac1a7bcecefeb5216f669)\n\nFix k = 0 edge case in power10 microkernels (#706)\n\nDetails:\n- When power10 sgemm and dgemm microkernels are called with k = 0, they\n become caught in infinite loops and segfault. This is fixed now via an\n early exit in the case of k = 0.\n- (cherry picked from d220f9c436c0dae409974724d42ab6c52f12a726)","shortMessageHtmlLink":"Restored ArmSVE general storage case. (#708)"}},{"before":null,"after":"8c29b37d7405ac37aab537a8071b7fa9b7d01dc3","ref":"refs/heads/stable-jan10-cand0","pushedAt":"2024-05-20T20:04:15.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Tile-level partitioning in jr/ir loops (ex-trsm). (#695)\n\nDetails:\n- Reimplemented parallelization of the JR loop in gemmt (which is\n recycled for herk, her2k, syrk, and syr2k). Previously, the\n rectangular region of the current MC x NC panel of C would be\n parallelized separately from from the diagonal region of that same\n submatrix, with the rectangular portion being assigned to threads via\n slab or round-robin (rr) partitioning (as determined at configure-\n time) and the diagonal region being assigned via round-robin. This\n approach did not work well when extracting lots of parallelism from\n the JR loop and was often suboptimal even for smaller degrees of\n parallelism. This commit implements tile-level load balancing (tlb) in\n which the IR loop is effectively subjugated in service of more\n equitably dividing work in the JR loop. This approach is especially\n potent for certain situations where the diagonal region of the MC x NR\n panel of C are significant relative to the entire region. However, it\n also seems to benefit many problem sizes of other level-3 operations\n (excluding trsm, which has an inherent algorithmic dependency in the\n IR loop that prevents the application of tlb). For now, tlb is\n implemented as _var2b.c macrokernels for gemm (which forms the basis\n for gemm, hemm, and symm), gemmt (which forms the basis of herk,\n her2k, syrk, and syr2k), and trmm (which forms the basis of trmm and\n trmm3). Which function pointers (_var2() or _var2b()) are embedded in\n the control tree will depend on whether the BLIS_ENABLE_JRIR_TLB cpp\n macro is defined, which is controlled by the value passed to the\n existing --thread-part-jrir=METHOD (or -r METHOD) configure option.\n This script adds 'tlb' as a valid option alongside the previously\n supported values of 'slab' and 'rr'. ('slab' is still the default.)\n Thanks to Leick Robinson for abstractly inspiring this work, and to\n Minh Quan Ho for inquiring (in PR #562, and before that in Issue #437)\n about the possibility of improved load balance in macrokernel loops,\n and even prototyping what it might look like, long before I fully\n understood the problem.\n- In bli_thread_range_weighted_sub(), tweaked the the way we compute the\n area of the current MC x NC trapezoidal panel of C by better taking\n into account the microtile structure along the diagonal. Previously,\n it was an underestimate, as it assumed MR = NR = 1 (that is, it\n assumed that the microtile column of C that overlapped with microtiles\n exactly coincided with the diagonal). Now, we only assume MR = NR.\n This is still a slight underestimate when MR != NR, so the additional\n area is scaled by 1.5 in a hackish attempt to compensate for this, as\n well as other additional effects that are difficult to model (such as\n the increased cost of writing to temporary tiles before finally\n updating C). The net effect of this better estimation of the\n trapezoidal area should be (on average) slightly larger regions\n assigned to threads that have little or no overlap with the diagonal\n region (and correspondingly slightly smaller regions in the diagonal\n region), which we expect will lead to slightly better load balancing\n in most situations.\n- Spun off the contents of bli_thread.[ch] that relate to computing\n thread ranges into one of three source/header file pairs:\n - bli_thread_range.[ch], which define functions that are not specific\n to the jr/ir loops;\n - bli_thread_range_slab_rr.[ch], which define functions that implement\n slab or round-robin partitioning for the jr/ir loops;\n - bli_thread_range_tlb.[ch], which define functions that implement\n tlb for the jr/ir loops.\n- Fixed the computation of a_next in the last iteration of the IR loop\n in bli_gemmt_l_ker_var2(). Previously, it always \"wrapped\" back around\n to the first micropanel of the current MC x KC packed block of A.\n However, this is almost never actually the micropanel that is used\n next. A new macro, bli_gemmt_l_wrap_a_upanel(), computes a_next\n correctly, with a similarly named bli_gemmt_u_wrap_a_upanel() for use\n in the upper-stored case (which *does* actually always choose the\n first micropanel of A as its a_next at the end of the IR loop).\n- Removed adjustments for a_next/b_next (a2/b2) for the diagonal-\n intersecting case of gemmt_l_ker_var2() and the above-diagonal case\n of gemmt_u_ker_var2() since these cases will only coincide with the\n last iteration of the IR loop in very small problems.\n- Defined bli_is_last_iter_l() and bli_is_last_iter_u(), the latter of\n which explicitly considers whether the current microtile is the last\n tile that intersects the diagonal. (The former does the same, but the\n computation coincides with the original bli_is_last_iter().) These\n functions are now used in gemmt to test when a_next (or a2) should\n \"wrap\" (as discussed above). Also defined bli_is_last_iter_tlb_l()\n and bli_is_last_iter_tlb_u(), which are similar to the aforementioned\n functions but are used when employing tlb in gemmt.\n- Redefined macros in bli_packm_thrinfo.h, which test whether an\n iteration of work is assigned to a thread, as static inline functions\n in bli_param_macro_defs.h (and then deleted bli_packm_thrinfo.h).\n In the process of redefining these macros, I also renamed them from\n bli_packm_my_iter_rr/sl() to bli_is_my_iter_rr/sl().\n- Renamed\n bli_thread_range_jrir_rr() -> bli_thread_range_rr()\n bli_thread_range_jrir_sl() -> bli_thread_range_sl()\n bli_thread_range_jrir() -> bli_thread_range_slrr()\n- Renamed\n bli_is_last_iter() -> bli_is_last_iter_slrr()\n- Defined\n bli_info_get_thread_jrir_tlb()\n and renamed:\n - bli_info_get_thread_part_jrir_slab() ->\n bli_info_get_thread_jrir_slab()\n - bli_info_get_thread_part_jrir_rr() ->\n bli_info_get_thread_jrir_rr()\n- Modified bli_rntm_set_ways_for_op() to redirect IR loop parallelism\n into the JR loop when tlb is enabled for non-trsm level-3 operations.\n- Added a sanity check to prevent bli_prune_unref_mparts() from being\n used on packed objects. This prohibition is necessary because the\n current implementation does not take into account the atomicity of\n packed micropanel widths relative to the diagonal of structured\n matrices. That is, the function prunes greedily without regard to\n whether doing so would prune off part of a micropanel *which has\n already been packed* and assigned to a thread for inclusion in the\n computation.\n- Further restricted early returns in bli_prune_unref_mparts() to\n situations where the primary matrix is not only of general structure\n but also dense (in terms of its uplo_t value). The addition of the\n matrix's dense-ness to the conditional is required because gemmt is\n somewhat unusual in that its C matrix has general structure but is\n marked as lower- or upper-stored via its uplo_t. By only checking\n for general structure, attempts to prune gemmt C matrices would\n incorrectly result in early returns, even though that operation\n effectively treats the matrix as symmetric (and stored in only one\n triangle).\n- Fixed a latent bug in bli_thread_range_rr() wherein incorrect ranges\n were computed when 1 < bf. Thankfully, this bug was not yet\n manifesting since all current invocations used bf == 1.\n- Fixed a latent bug in some unexercised code in bli_?gemmt_l_ker_var2()\n that would perform incorrect pruning of unreferenced regions above\n where the diagonal of a lower-stored matrix intersects the right edge.\n Thankfully, the bug was not harming anything since those unreferenced\n regions were being pruned prior to the macrokernel.\n- Rewrote slab/rr-based gemmt macrokernels so that they no longer carved\n C into rectangular and diagonal regions prior to parallelizing each\n separately. The new macrokernels use a unified loop structure where\n quadratic (slab) partitioning is used.\n- Updated all level-3 macrokernels to have a more uniform coding style,\n such as wrt combining variable declarations with initializations as\n well as the use of const.\n- Updated bls_l3_packm_var[123].c to use bli_thrinfo_n_way() and\n bli_thrinfo_work_id() instead of bli_thrinfo_num_threads() and\n bli_thrinfo_thread_id(), respectively. This change probably should\n have been included in aeb5f0c.\n- Removed old prototypes in bli_gemmt_var.h and bli_trmm_var.h that\n corresponded to functions that were removed in aeb5f0c.\n- Other very minor cleanups.\n- Comment updates.\n- (cherry picked from commit 2e1ba9d13c23a06a7b6f8bd326af428f7ea68c31)","shortMessageHtmlLink":"Tile-level partitioning in jr/ir loops (ex-trsm). (#695)"}},{"before":null,"after":"656463948e236e529e9c9cf48fe60d5a62da2119","ref":"refs/heads/stable-jan6-cand0","pushedAt":"2024-05-20T20:04:10.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Refactor structure awareness in packm_blk_var1.c. (#707)\n\nDetails:\n- Factored some of the structure awareness out of the loop in\n bli_packm_blk_var1(). So instead of having a single loop with\n conditionals in the body to handle various kinds of structure (and\n stored/unstored submatrix placement), we now have a conditional branch\n to handle various structure/storage scenarios with a loop in each\n section. This change was originally motivated to choose slab or round-\n robin partitioning (in the context of triangular matrices) based on\n the structure of the entire block (or panel) being packed rather than\n each micropanel individually. Previously, the code would attempt to\n limit rr to the portion of the block that intersects the diagonal and\n use slab for the remainder. However, that approach was not well-thought\n out and in many situations this would lead to inferior load balancing\n when compared to using round-robin for the entire block (or panel).\n This commit has the added benefit of incurring less overhead during\n the packing process now that each of the new loops is simpler.\n- (cherry picked from commit b6735ca26b9d459d9253795dc5841ae8de9e84c9)\n\nSwitch to l3 sup decorator in gemmlike sandbox. (#704)\n\nDetails:\n- Modified the gemmlike sandbox to call bli_l3_sup_thread_decorator()\n rather than a local analogue of that code. This reduces redundant\n logic and makes it easier for the sandbox to inherit future\n improvements to the framework's threading code.\n- Moved addon/gemmd to addon/old/gemmd. This code has fallen out of date\n and is taking too much effort to maintain. We will very likely\n reimplement it completely once future changes are made to the\n framework proper.\n- (cherry picked from f956b79922da412791e4c8b8b846b3aafc0a5ee0)","shortMessageHtmlLink":"Refactor structure awareness in packm_blk_var1.c. (#707)"}},{"before":null,"after":"b8ffda1e2df635b6c624917bd0ef8de629756691","ref":"refs/heads/stable-dec16-cand0","pushedAt":"2024-05-20T19:12:05.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Skip 1m optimization when forcing hemm_l/symm_l. (#697)\n\nDetails:\n- Fixed a bug in right-sided hemm when:\n - using the 1m method,\n - #defining BLIS_DISABLE_HEMM_RIGHT in the active subconfiguration,\n and\n - the storage of C matches the gemm microkernel IO preference PRIOR to\n the right-sidedness being detected and recast in terms of the left-\n side code path.\n It turns out that bli_gemm_ind_recast_1m_params() was applying its\n optimization (recasting a complex-domain macrokernel calling a 1m\n virtual microkernel to a real-domain macrokernel calling the real-\n domain microkernel) in situations in which it should not have. The\n optimization was silently assuming that the storage of C always\n matched that of the microkernel preference, since the front-end (in\n this case, bli_hemm_front()) would have already had a chance to\n transpose the operation to bring the two into agreement. However, by\n disabling right-sided hemm, we deprive BLIS of that flexibility (as a\n transposed left-sided case would necessarily have to become a right-\n sided case), and thus the assumption was no longer holding in all\n cases. Thanks to Nisanth M P for reporting this bug in Issue #621.\n- The aforementioned bug, and its bugfix, also apply to symm when\n BLIS_DISABLE_SYMM_RIGHT is defined.\n- Comment updates.\n- CREDITS file update.\n- (cherry picked from commit 3accacf57d11e9b109339754f91bf22329b6cb6a)\n\nFixed perf of mt sup with packing, and mt gemmlike. (#696)\n\nDetails:\n- Brought the gemmsup code path up to date relative to the latest\n thrinfo_t semantics introduced in the October Omnibus commit\n (aeb5f0c). This was done by passing the prenode (instead of the\n current node) into the packm variant within bli_l3_sup_packm.c as well\n as creating the prenodes and attaching them to the thrinfo_t tree in\n bli_l3_sup_thrinfo_create(). These changes erase the performance\n degradation introduced in the omnibus when running multithreaded sup\n with optional packing enabled. Special thanks to Devin Matthews for\n sussing out this fix in short order.\n- Fixed the gemmlike sandbox in a manner similar to that of sup with\n packing, described above. This also involved passing the prenode into\n the local gemmlike packm variant. (Recall that gemmlike recycles the\n use of bli_l3_sup_thrinfo_create(), so it automatically inherits that\n part of the sup fix described above.)\n- Updated bls_l3_packm_var[123].c to use bli_thrinfo_n_way() and\n bli_thrinfo_work_id() instead of bli_thrinfo_num_threads() and\n bli_thrinfo_thread_id(), respectively.\n- (cherry picked from 4833ba224eba54df3f349bcb7e188bcc53442449)\n\nFixed _gemm_small() prototype; disabled gemm_small.\n\nDetails:\n- Fixed a mismatch between the prototype for bli_gemm_small() in\n bli_gemm_front.h and the actual definition of bli_gemm_small() in\n kernels/zen/3/bli_gemm_small.c. The former was erroneously declaring\n the cntl_t* argument as 'const'. Thanks to Jeff Diamond for reporting\n this issue.\n- Commented out BLIS_ENABLE_SMALL_MATRIX, BLIS_ENABLE_SMALL_MATRIX_TRSM\n macro definitions in config/zen3/bli_family_zen3.h. AMD's small matrix\n implementation should probably remain disabled in vanilla BLIS, at\n least for now.\n- (cherry picked from db10dd8e11a12d85017f84455558a82c0093b1da)\n\nTrival whitespace/comment tweaks.\n\nDetails:\n- Trivial whitespace and comment changes, most of which ideally would\n have been part of the previous commit pertaining to HPX (2b05948).\n- (cherry picked from f0337b784d164ae505ca0e11277a1155680500d1)\n\nblis support for hpx (#682)\n\n- Implement threading backend via HPX.\n- HPX is an asynchronous many task runtime system used in high\n performance computing applications. The runtime implements the\n ISO C++ parallelism specification and provides a user-space\n thread implementation.\n- This PR provides BLIS a thread backend implementation using HPX\n and resolves feature request #681. The configuration script,\n makefiles, and testsuite have been updated to support an HPX\n build option. The addition of HPX support provides other\n developers an exemplar for integrating other C++ threading\n backends into BLIS.\n- (cherry picked from 2b05948ad2c9785bc53f376d53a7141cbc917447)\n\nFixed subtle barrier_fpa bug in bli_thrcomm.c. (#690)\n\nDetails:\n- In bli_thrcommo.c, correctly initialize the BLIS_OPENMP element of the\n barrier function pointer array (barrier_fpa) to NULL when\n BLIS_ENABLE_OPENMP is *not* defined. Similarly, initialize the\n BLIS_POSIX element of barrier_fpa to NULL when BLIS_ENABLE_PTHREADS is\n not enabled. This bug was introduced in a1a5a9b and was likely the\n result of an incomplete edit. The effects of the bug would have\n likely manifested when querying a thrcomm_t that was initialized with\n a timpl_t value corresponding to a threading implementation that was\n omitted from the -t option at configure-time.\n- (cherry picked from e1ea25da43508925e33d4e57e420cfc0a9de793f)\n\nEnhance emacs formatting of C files to remove trailing whitespace and ensure\n a newline at the end of file\n- (cherry picked from dc6e5f3f5770074ba38554541b8b64711a68c084)\n\nDelete mpi_test garbage. (#689)\n\nDetails:\n- tlrmchlsmth: \"What even is this? No comments, no commit message, not\n used by anything. Trash.\"\n- (cherry picked from 713d078075a4a563a43d83fd0880ab5091c2e4a4)\n\nSome decluttering of the top-level directory.\n\nDetails:\n- Relocated 'mpi_test' directory to test/mpi_test.\n- Relocated 'so_version' and 'version' files from top-level directory to\n 'build' directory.\n- Updated build/bump-version.sh script to accommodate relocation of\n 'version' file to 'build' directory.\n- Updated configure script to accommodate relocation of 'so_version'\n file to 'build' directory.\n- Updated INSTALL file to replace pointers to blis-devel mailing list\n with a pointer to docs/Discord.md.\n- Updated RELEASING file to contain a reminder to consider whether the\n so_version file should be updated prior to the release.\n- (cherry picked from 8d813f7f12732d52c95570ae884d5defbfd19234)\n\nFix typo in configure --help text. (#686)\n\nDetails:\n- Fixed a misspelling in the --help description for the --int-size (-i)\n configure option.\n- (cherry picked from 6774bf08c92fc6983706a91bbb93b960e8eef285)\n\nSupport --nosup, --sup configure options. (#684)\n\nDetails:\n- Added --nosup and --sup as alternative ways of requesting that sup be\n disabled or enabled. These are analagous to --disable-sup-handling and\n --enable-sup-handling, respectively. (I got tired of typing out\n --disable-sup-handling and needed a shorthand notation.)\n- Tweaked message output by configure when sup is enable/disabled for\n clarity and specificity.\n- Whitespace changes.\n- (cherry picked from edcc2f9940449f7d9cefcfc02159d27b013e7995)\n\nAdd mention of Wilkinson Prize to README.md. (#683)\n\nDetails:\n- Added blurbs and links to Wilkinson Prize to README.md.\n- Added mention of both Best Paper and Wilkinson Prizes to the top of\n README.md.\n- Other minor tweaks.\n- (cherry picked from 5eea6ad9eb25f37685d1ae4ae08c73cd1daca297)","shortMessageHtmlLink":"Skip 1m optimization when forcing hemm_l/symm_l. (#697)"}},{"before":"aa62b3dd8d6be878d48329845636799e49c610c7","after":null,"ref":"refs/heads/plugins","pushedAt":"2024-05-15T18:38:00.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":"4cf2a99832c7e2c572493d358d972ed3da3b0f4e","after":null,"ref":"refs/heads/stable-oct27-cand4","pushedAt":"2024-05-15T18:27:23.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":"a72680080dc446ec4f948a9b6be114f77d5ed8b1","after":null,"ref":"refs/heads/stable-oct27-cand3","pushedAt":"2024-05-15T18:27:03.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":"efebd1fe46ecd6b814922551ffdb6fc9e936b6e9","after":null,"ref":"refs/heads/stable-oct27-cand2","pushedAt":"2024-05-15T18:26:58.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":"01e151a9658cbe07ee0cac8b03fa13fef26df19e","after":"6d0ab74f6975fdf4d19cee06d946b09b6ca89656","ref":"refs/heads/master","pushedAt":"2024-05-06T21:07:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Updates to README.md section on downloading.\n\nDetails:\n- Updated the text in README.md in the \"How to Download BLIS\" section.\n The new text no longer recommends that the reader use the 'master'\n branch over official releases, as the previous text did. The text was\n tweaked since (a) the 'master' branch is now akin to a development\n branch, and (b) the reader will no longer forgo bugfixes by sticking\n to official releases since we will (going forward) publish bugfix\n releases for the most recent version.","shortMessageHtmlLink":"Updates to README.md section on downloading."}},{"before":"06dddf1e51ccff70d77ee8cb731c3217e70eb730","after":"01e151a9658cbe07ee0cac8b03fa13fef26df19e","ref":"refs/heads/master","pushedAt":"2024-05-06T20:40:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Updated RELEASING file; fixes to ReleaseNotes.md.\n\nDetails:\n- Updated RELEASING file to reflect new release protocols, given the\n more sophisticated policy of maintaining release candidate branches\n separate from 'master' (which is now more akin to a development\n branch). Further refinements to this file will likely follow.\n- Fixed typos in ReleaseNotes.md. Thanks to Robert van de Geijn for\n reporting these.","shortMessageHtmlLink":"Updated RELEASING file; fixes to ReleaseNotes.md."}},{"before":null,"after":"49af2243c2a60ed8fedb44f237f4ec100465cd89","ref":"refs/heads/1.0-final","pushedAt":"2024-05-06T19:19:01.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"ReleaseNotes.md update.\n\nDetails:\n- (cherry picked from commit 06dddf1e51ccff70d77ee8cb731c3217e70eb730)\n\nCHANGELOG update (1.0)\n\nDetails:\n- (cherry picked from commit a876918c8c79a1c3d3d95de1f283350b7249b8ae)\n\nVersion file update (1.0)\n\nDetails:\n- (cherry picked from commit c2af113c7ba6d0dcc128ba36ec6e140d89180cf3)","shortMessageHtmlLink":"ReleaseNotes.md update."}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEV14N5gA","startCursor":null,"endCursor":null}},"title":"Activity ยท flame/blis"}