Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NextGen Analytics Ramping up on TriBITS Refactor work #434

Closed
14 of 16 tasks
bartlettroscoe opened this issue Dec 9, 2021 · 49 comments
Closed
14 of 16 tasks

NextGen Analytics Ramping up on TriBITS Refactor work #434

bartlettroscoe opened this issue Dec 9, 2021 · 49 comments

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Dec 9, 2021

Parent Issue:

Description

This issue is an outline for ramping up to help out with the TriBITS Refactor work being tracked in the TriBITS Refactor board and the EPIC #367.

Tasks

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Dec 10, 2021

Hello @mperrinel and @marcinwrobel1986,

Here is the Issue with a set of tasks to start ramping up and provide some initial feedback. Please let me know if you have any questions about any of the tasks listed above.

For the review of specific files listed above, I think you can review the files by adding references like this:

function(tribits_external_package_write_config_file tplName tplConfigFile)
tribits_external_package_write_config_file_str(${tplName} tplConfigFileStr)
file(WRITE "${tplConfigFile}" "${tplConfigFileStr}")
endfunction()

in a comment in this issue (or some other issue just for that set of files) and then saying something about it that comment.

Another option to do a review, which is more work to set up, would be to create a dummy branch for an older version of 'master', then copy files to be reviewed into another dummy branch taken off of that older version of the code, then set up a PR with the updated dummy branch against the older reference dummy branch. That way, that PR would should show just the files to be reviewed in their current state and would allow doing a review just like any other PR. Only you would not bother merging the PR at the end of the review. If you are interested in that approach, I can set up a demo and we can try it out. Just let me know.

@bartlettroscoe
Copy link
Member Author

Given the creation of #437, I can assume this has started?

@bartlettroscoe bartlettroscoe moved this from Selected to In Progress in TriBITS Refactor Dec 15, 2021
@bartlettroscoe
Copy link
Member Author

@mperrinel

As for Spack builds of Trilinos, please:

  • Use current version of Trilinos 'develop' branch
  • Enable the TPLs as shown in PR testing (here and here) which as of 1/5/2022 are Final set of enabled TPLs: Pthread MPI BLAS LAPACK Boost Scotch ParMETIS Zlib HDF5 Netcdf SuperLU BoostLib DLlib 13
  • Use the same versions of the TPLs as used in Trilinos PR testing (but Spack may know the most current version that works) or use the same version used in Trilinos PR testing (see the version numbers in the directory names such as netcdf/4.7.3 here).

@mperrinel
Copy link
Collaborator

mperrinel commented Jan 17, 2022

@bartlettroscoe

Review of https://tribits.org/doc/TribitsUsersGuide.html#how-to-use-find-package-for-a-tribits-tpl (How to use find_package() for a TriBITS TPL) checkbox 5.1:
I understand the idea of a FindTPL<PackageName> like FindTPLHDF5.cmake. But I don't really understand the need of generating a <tplName>Config.cmake from the FindTPL<tplName>.cmake file. Of course, I understand the need of creating the "all_libs" target. The problem is for me about the idea of generating a Config file from a Module file. From CMake find_package documentation, we can read that a Module file (Find<PackageName>.cmake is generally provided by something which is external of the project (like CMake itself in different OS), or by anyone who needs to import a package for which a Config file is not available or insufficient. And as a result, the Module file can quickly become out-of-date. On another side, the Config file (<PackageName>Config.cmake) is typically installed as part of the package and shouldn't become out-of-date since it should be updated with the Package update itself.
So for me, if the Module file and the config file have basically the same goal (e.g provided Package variables and/or Packages imported targets in a more modern CMake approach), they are also clearly separated in their essence (one is outside of the package and the other one is with the package). That's why having a Module file that generates a Config file for me seems strange.

Does that make sense ?

Why can't we simply integrate the Config generated code directly inside the FindTPL Module file ?
Moreover, as you indicate at the end of 9.5 section, generating a such Config file can enter in conflict with the one created by the developer of the Package itself (this seems to be the case for HDF5).
Maybe, an alternative could be to change the name of this file (e.g generated HDF5Config.cmake file could become HDF5TPLConfig.cmake) to make the difference. I think we can specify the name of a Config file to import with find_package signature (NAMES option).

@bartlettroscoe
Copy link
Member Author

@mperrinel

The glue between existing <tplName>Config.cmake files using modern CMake targets but lacking a <tplName>::all_libs target is the least well-defined part of this whole effort so don't take what you read on this too seriously. What is there now is a straw-man that works and we can improve on.

The basic requirements are:

  • Every installed TriBITS package's IMPORTED targets must be able to link to and find upstream package's <upstreamPackage>::all_libs

That is about it.

But I don't really understand the need of generating a <tplName>Config.cmake from the FindTPL<tplName>.cmake file

The idea is that the FindTPL<tplName>.cmake is a single point of injection where an external TPL is made to behave like a compliant package that provides the <tplName>::all_libs target. And that includes needing to provide the <tplName>::all_libs target for the installed downstream TriBITS package IMPORTED targets to find. This may seem strange but it satisfies the requirements and provides a single point of specialization to hook up any external package into this system.

The problem is for me about the idea of generating a Config file from a Module file. From CMake find_package documentation, we can read that a Module file (Find<PackageName>.cmake is generally provided by something which is external of the project (like CMake itself in different OS), or by anyone who needs to import a package for which a Config file is not available or insufficient.

Note that a FindTPL<tplName>.cmake is never found by a find_package(<tplName> ...) command because it does not have the standard name Find<tplName>.cmake. That is the point. The purpose of FindTPL<tplName>.cmake is to provide what is needed to find and inject an external package/TPL into a TriBITS project, not to satisfy what is needed by find_package().

Again, how do this glue for external packages is up in the air right now and I don't see any really great solutions. We just need a well documented and solid approach that we know will work. That will evolve as we complete #299 and #63 and that is okay.

One thing that would make it easier and more robust to implement these specialized FindTPL<tplName>.cmake modules is to have a version of find_package() that finds a package (and therefore returns <tplName>_DIR) but does not load any of its targets. That way, we could generate a wrapper <tplName>Config.cmake file that includes the found external <tplName>Config.cmake file (from <tplName>_DIR) and then defines the INTEFACE IMPORTED <tplName>::all_libs target and then just includes that generated wrapper file <buildDir>/external_packages/<tplName>/<tplName>Config.cmake for the cmake project itself. This is consistent with what happens with the TPL specifications using library names, library dirs, and include dirs implemented here showing:

  tribits_external_package_write_config_file(${TPL_NAME} "${tplConfigFile}")
  if (NOT ${PROJECT_NAME}_ENABLE_INSTALLATION_TESTING)
    include("${tplConfigFile}")
  endif()

Basically, the same code that generates the IMPORTED targets for external clients gets used to create these targets for the internal CMake targets. That is idea.

Again, all of this is a work in progress that will evolve and we complete #299 and #63.

@bartlettroscoe
Copy link
Member Author

@mperrinel, thanks for your comments in Plan to merge TriBITS concepts and implementation of Packages and TPLs. I think I have responded to all of them and make a change based on one of them.

The most critical use case driving Epic #367 is "Use Case 3: Configure/build pointing to a subset of already installed TriBITS packages in same repo". But given we can implement that, the other use cases will be easier to implement when needed.

@bartlettroscoe
Copy link
Member Author

@mperrinel and @marcinwrobel1986, as discussed at the meeting today, please comment in this issue or in the associated PR on any reviews you have done and what your impressions are.

@mperrinel
Copy link
Collaborator

mperrinel commented Jan 24, 2022

@bartlettroscoe, concerning the issue (Checkbox 2: Clone Trilinos repo...). Spack has been used to get Trilinos and its dependencies quickly and with most of this TPLs. Trilinos has been installed using this Spack install command :
spack install --test=root --keep-stage --source --overwrite trilinos@develop +boost +mpi +zoltan +exodus +hdf5

  • --test=root permits to run the test at the end of the build of Trilinos. However, it ensures the call of CTest for a CMake package, but CTest doesn't do anything if no tests are built and it is the case for Trilinos which is not built by default with the tests.
  • --keep-stage permits to keep the build of Trilinos
  • --source permits to keep the source of Trilinos

By default, the stage is kept on a temporary directory, so even if you keep the build, it is automatically removed. To fix this problem, I edited a line on the spack/etc/spack/defaults/config.yaml :

# identifies Spack staging to avoid accidentally wiping out non-Spack work.
  build_stage:<$where_you_want_to_keep_your_build>

You can change the default <$where_you_want_to_keep_your_build> by a directory which is not temporary.
--overwrite permits to rebuild Trilinos itself every time you call the spack install command.

The default Trilinos version is 13.0.1 so it is necessary to add @develop to specify the develop version. The different variants activated with the command permit to obtain these TPLs:
HWLOC MPI BLAS LAPACK Boost METIS ParMETIS Zlib HDF5 CGNS Pnetcdf Netcdf Matio DLlib 14

We can compare with the requested TPLs:
Pthread MPI BLAS LAPACK Boost Scotch ParMETIS Zlib HDF5 Netcdf SuperLU BoostLib DLlib

And we can see, Pthread, Scotch, SuperLU and BoostLib are missing when, METIS, CGNS, Pnetcdf and Matio are extra TPls.
For the moment, we don't think that having more TPLs is a problem so we focused on adding missing TPLs. To do it, we decided to do it manually, which means to enter into our Trilinos build directory and running the CMake command with the missing options:

  1. PThread and BoostLib have been easily added with the option TPL_ENABLED_Pthread=ON and TPL_ENABLED_BoostLib=true
  2. With the option TPL_ENABLED_Scotch=ON, Trilinos configuration complained about missing variables TPL_Scotch_LIBRARIES and TPL_Scotch_INCLUDE_DIRS. To fix that, I installed Scotch through Spack :
    spack install --test=root --keep-stage --source --overwrite scotch@6.0.3 and then I defined the missing values to the Spack scotch installation path. libscotch.so, libptscotch.so and libptscotcherr.so were the necessary libraries to add.
  3. SuperLU was more difficult to add because the targeted version is 4.0.3, which is also the default version used by the corresponding Trilinos SuperLU variant, but it is a deprecated version on Spack and when I tried it, I've got two errors :
==> No binary for superlu-4.3-j5tghbtuinfhxh5ocyrahddlr7aazucq found: installing from source
==> Warning: superlu@4.3 is deprecated and may be removed in a future **Spack** release.

Superlu config error
->      21    /home/perrinel/Dev/spack/lib/spack/env/gcc/gcc -fPIC -o testtimer superlu_timer.o timertst.o
    22    Testing machines parameters and timer
    23    csh install.csh
    24    make[1]: csh: Command not found
 >> 25    make[1]: *** [Makefile:16: install.out] Error 127
    26    make[1]: Leaving directory '/home/perrinel/Dev/spack-stage/perrinel/spack-stage/spack-stage-superlu-4.3-j5tghbtuinfhxh5ocyrahddlr7aazucq/spack-src/INSTALL'
 >> 27    make: *** [Makefile:28: install] Error 2

This first error has been fixed by installing csh command with apt.

  ==> superlu: Executing phase: 'build'
==> Error: FileNotFoundError: [Errno 2] No such file or directory: '/home/perrinel/Dev/spack-stage/perrinel/spack-stage/spack-stage-superlu-4.3-j5tghbtuinfhxh5ocyrahddlr7aazucq/spack-build-j5tghbt'
**/home/perrinel/Dev/spack/lib/spack/spack/build_systems/cmake.py:369, in build:
        368    def build(self, spec, prefix):
  >>    369        """Make the build targets"""
        370        with working_dir(self.build_directory):
        371            if self.generator == 'Unix Makefiles':
        372                inspect.getmodule(self).make(*self.build_targets)**

I was not able to fix this error so I decided to install SuperLU my self using the source used by Spack :
wget https://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_4.3.tar.gz
To compile it, it was necessary to follow its README instructions and to modify the make.inc file which is a the top of the SuperLU source. The compilation depends on BLAS and you need to set the BLASLIB variable. I set it to the complete full lib path of libopenblas.a provided by Spack. In addition, Trilinos needs SuperLU to be compiled with fPIC flags, so I had to add it to the CFLAG on the the make.inc file :
CFLAGS = -DPRNTlevel=0 -O3 -fPIC

Once the compilation is done, SuperLU installation can be used by Trilinos as a TPL with the correct options : TPL_ENABLED_SuperLU=ON, TPL_SuperLU_INCLUDE_DIRS and TPL_SuperLU_LIBRARIES. Only the libsuperlu_4.3.a has to be set to the TPL_SuperLU_LIBRARIES variable

Finally with few manual installations, all the requested TPLs are enabled with Trilinos.

@mperrinel
Copy link
Collaborator

@bartlettroscoe. Review of the checkbox 5.3 ( Using packages from the build tree...):
I found the document very clear. I think maybe that to specify that target_link_libraries is responsible for _"Note that in this case, the include directories and other imported compiler options from the source tree and the build tree are automatically injected into the build targets...", but it is really a detail.

@bartlettroscoe
Copy link
Member Author

@bartlettroscoe. Review of the checkbox 5.3 ( Using packages from the build tree...):
I found the document very clear. I think maybe that to specify that target_link_libraries is responsible for _"Note that in this case, the include directories and other imported compiler options from the source tree and the build tree are automatically injected into the build targets...", but it is really a detail.

@mperrinel, can you put in a PR with your suggested edits?

@bartlettroscoe
Copy link
Member Author

By default, the stage is kept on a temporary directory, so even if you keep the build, it is automatically removed. To fix this problem, I edited a line on the spack/etc/spack/defaults/config.yaml:

# identifies Spack staging to avoid accidentally wiping out non-Spack work.
 build_stage:<$where_you_want_to_keep_your_build>

@mperrinel, can you please push a branch to your fork of spack with this and any other changes to spack? This violates the Open-Closed Principle (OCP).

@bartlettroscoe
Copy link
Member Author

@mperrinel and @keitat,

Question, what do we about the build and test failures for the Spack Trilinos configuration shown here? We see a build error in Amesos2_KLU2_UnitTests and 9 test failures with Zoltan. If we don't fix them, we should disable them.

It is important we can automate the building and testing of this configuration so we can use to test updates of TriBITS. And we need a 100% passing reference build of Trilinos for that purpose.

@mperrinel
Copy link
Collaborator

mperrinel commented Feb 2, 2022

@bartlettroscoe @keitat
I want to create a new dashboard of reference using SuperLU 5.3 instead of SuperLU 4.3
I had to change first the default version of SuperLU from the spack pakage.py file of Trilinos (4.3 -> 5.3).
With that, the SuperLU dependency have been automatically installed with Spack.
Unfortunately, Trilinos itself doesn't compile anymore (because of SuperLU dependency). This is the error :

  6303    /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp: In member function 'virtual int Amesos_Superlu::NumericFactorization()':
  >> 6304    /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp:490:13: error: cannot convert 'SLU::mem_usage_t*' to 'SLU::GlobalLU_t*'
     6305      490 |             &(data_->mem_usage), &SLU_stat, &Ierr[0] );
     6306          |             ^~~~~~~~~~~~~~~~~~~
     6307          |             |
     6308          |             SLU::mem_usage_t*
     6309    In file included from /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp:45:
     6310    /home/perrinel/Dev/spack/opt/spack/linux-ubuntu20.04-skylake/gcc-9.3.0/superlu-5.3.0-glyckr7lylaepepp2kivxgqzqgipdouj/include/slu_ddefs.h:115:8: note:   initializing argument 19 of 'void SLU::dgssvx(SLU::superlu_options_t*, SLU::SuperMatr
             ix*, int*, int*, int*, char*, double*, double*, SLU::SuperMatrix*, SLU::SuperMatrix*, void*, int, SLU::SuperMatrix*, SLU::SuperMatrix*, double*, double*, double*, double*, SLU::GlobalLU_t*, SLU::mem_usage_t*, SLU::SuperLUStat_t*, int*)'
     6311      115 |        GlobalLU_t *, mem_usage_t *, SuperLUStat_t *, int *);
     6312          |        ^~~~~~~~~~~~
     6313    /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp: In member function 'virtual int Amesos_Superlu::Solve()':
  >> 6314    /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp:626:13: error: cannot convert 'SLU::mem_usage_t*' to 'SLU::GlobalLU_t*'
     6315      626 |             &(data_->mem_usage), &SLU_stat, &Ierr);
     6316          |             ^~~~~~~~~~~~~~~~~~~
     6317          |             |
     6318          |             SLU::mem_usage_t*
     6319    In file included from /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp:45:
     6320    /home/perrinel/Dev/spack/opt/spack/linux-ubuntu20.04-skylake/gcc-9.3.0/superlu-5.3.0-glyckr7lylaepepp2kivxgqzqgipdouj/include/slu_ddefs.h:115:8: note:   initializing argument 19 of 'void SLU::dgssvx(SLU::superlu_options_t*, SLU::SuperMatr
             ix*, int*, int*, int*, char*, double*, double*, SLU::SuperMatrix*, SLU::SuperMatrix*, void*, int, SLU::SuperMatrix*, SLU::SuperMatrix*, double*, double*, double*, double*, SLU::GlobalLU_t*, SLU::mem_usage_t*, SLU::SuperLUStat_t*, int*)'
     6321      115 |        GlobalLU_t *, mem_usage_t *, SuperLUStat_t *, int *);
     6322          |        ^~~~~~~~~~~~
  >> 6323    make[2]: *** [packages/amesos/src/CMakeFiles/amesos.dir/build.make:149: packages/amesos/src/CMakeFiles/amesos.dir/Amesos_Superlu.cpp.o] Error 1
     6324    make[2]: *** Waiting for unfinished jobs....
     6325    [ 79%] Building CXX object packages/tpetra/core/src/CMakeFiles/tpetra.dir/Tpetra_CrsMatrix_LONG_LONG_INT_LONG_LONG_SERIAL.cpp.o
     6326    cd /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-build-eh4maml/packages/tpetra/core/src && /home/perrinel/Dev/spack/opt/spack/linux-ubuntu20.04-skylake/gcc-9.3.0/openmpi-4.
             1.2-dbbo72yb7ys57hq625tytdzczn4s2fym/bin/mpic++ -Dtpetra_EXPORTS -I/home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-build-eh4maml -I/home/perrinel/Dev/spack-stage2/spack-stage
             /spack-stage-trilinos

@mperrinel
Copy link
Collaborator

@bartlettroscoe
Concerning the violation of the Open-Closed Principle (OCP). I found another way to change the build stage without having to edit the spack/etc/spack/defaults/config.yaml. I exported the TMP variable in my terminal to a location which is not temporary. That works too. It that a better option ?

@mperrinel
Copy link
Collaborator

@bartlettroscoe Review of the checkbox 5.4 and 5.5

  1. The build instruction of 5.4 works perfectly.
  2. Subpackages optional B and C aren't compiled by default even with the option TribitsExProj_ENABLE_ALL_OPTIONAL_PACKAGES=ON. The reason is that TribitsExProj_ENABLE_SECONDARY_TESTED_CODE=OFF by default.
  3. Subpackages optional B and C compile if we enable TribitsExProj_ENABLE_SECONDARY_TESTED_CODE
  4. The default instruction requests to do make and ctest but not make install. This instruction is necessary for the app example defined in 5.5
  5. The app example associated to 5.5 doesn't compile by default if we follow the default instruction of its dependencies (5.4). The problem is that the app depends on the C optional subpackage which is not activated by default in 5.4
  6. Once the subpackage C activated in 5.4 with TribitsExProj_ENABLE_SECONDARY_TESTED_CODE=ON, the app example compiled perfectly

@bartlettroscoe
Copy link
Member Author

I exported the TMP variable in my terminal to a location which is not temporary. That works too. It that a better option ?

@mperrinel, as long as you can document it and it is repeatable, that seems like an acceptable solution. But it seems that you would need to reset TMP just for the Spack build and then put it back again to limit the impact this has.

@bartlettroscoe
Copy link
Member Author

The default instruction requests to do make and ctest but not make install. This instruction is necessary for the app example defined in 5.5

@mperrinel, the instructions in TribitsExampleApp/README.md assume that you have installed TribitsExampleProject first as per:

To build against all of the installed packages from an upstream TribitsExampleProject, configure, build, and run the tests with:

Would it help if this was changed to:

First configure, build, and install TribitsExampleProject under <upstreamInstallDir>. Then configure against this install, build, and run the tests for TribitsExampleApp with:

?

@bartlettroscoe
Copy link
Member Author

Subpackages optional B and C compile if we enable TribitsExProj_ENABLE_SECONDARY_TESTED_CODE

@mperrinel, correct. To make this simpler, we should perhaps just set TribitsExProj_ENABLE_SECONDARY_TESTED_CODE=ON by default? But then I think you can't build by default without a Fortran compiler. Perhaps we just need to document this better?

@mperrinel
Copy link
Collaborator

@bartlettroscoe Review of the checbox 5.6
Instructions works perfectly well.

@bartlettroscoe
Copy link
Member Author

The app example associated to 5.5 doesn't compile by default if we follow the default instruction of its dependencies (5.4). The problem is that the app depends on the C optional subpackage which is not activated by default in 5.4

@mperrinel, okay. Then let's update TribitsExampleProject/README.md to recommend explicitly setting:

-D TribitsExProj_ENABLE_SECONDARY_TESTED_CODE=ON

and mentioning that this will require Fortran. We will also need to update the instructions for TribitsExampleApp/README.md that TribitsExampleProject needs to be configured with TribitsExProj_ENABLE_SECONDARY_TESTED_CODE=ON.

@mperrinel
Copy link
Collaborator

mperrinel commented Feb 2, 2022

@mperrinel, okay. Then let's update TribitsExampleProject/README.md to recommend explicitly setting:

@bartlettroscoe
Sure, I can create a PR for that if you want. We should mentioning the Fortran requirement in the first section of the 5.5 too (TribitsExampleApp).
The " build against all of the installed packages" is also confusing because it not only the installed packages but also the other packages that could be installed from TribitsExampleProject. I guess the modification :

First configure, build, and install TribitsExampleProject under <upstreamInstallDir>. Then configure against this install, build, and run the tests for TribitsExampleApp

is correct. In addition, we should mention that the subpackage C of TribitsExampleProject require Fortran here.

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Feb 3, 2022

Unfortunately, Trilinos itself doesn't compile anymore (because of SuperLU dependency). This is the error :

@mperrinel, If no one is keeping testing Trilinos with recent versions of SuperLU and if SuperLU is not maintaining backward compatibility, then we should not be surprised that updated versions of Trilinos and SuperLU don't work together.

@keitat, what is the status of ECP-xSDK testing of Trilinos and SuperLU?

@marcinwrobel1986
Copy link
Collaborator

Hello @bartlettroscoe @mperrinel ,
I just created a PR with readme improvements.
Please feel free, to take a look, comment, edit.
I also went through the whole process.
We could mark done the 5.4, 5.5 and 5.6 points in the issue.

@keitat
Copy link
Collaborator

keitat commented Feb 3, 2022

I confirmed the error with Amesos. Can you change back to superlu@4.3? I will report it to Amesos team.

@bartlettroscoe @keitat I want to create a new dashboard of reference using SuperLU 5.3 instead of SuperLU 4.3 I had to change first the default version of SuperLU from the spack pakage.py file of Trilinos (4.3 -> 5.3). With that, the SuperLU dependency have been automatically installed with Spack. Unfortunately, Trilinos itself doesn't compile anymore (because of SuperLU dependency). This is the error :

  6303    /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp: In member function 'virtual int Amesos_Superlu::NumericFactorization()':
  >> 6304    /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp:490:13: error: cannot convert 'SLU::mem_usage_t*' to 'SLU::GlobalLU_t*'
     6305      490 |             &(data_->mem_usage), &SLU_stat, &Ierr[0] );
     6306          |             ^~~~~~~~~~~~~~~~~~~
     6307          |             |
     6308          |             SLU::mem_usage_t*
     6309    In file included from /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp:45:
     6310    /home/perrinel/Dev/spack/opt/spack/linux-ubuntu20.04-skylake/gcc-9.3.0/superlu-5.3.0-glyckr7lylaepepp2kivxgqzqgipdouj/include/slu_ddefs.h:115:8: note:   initializing argument 19 of 'void SLU::dgssvx(SLU::superlu_options_t*, SLU::SuperMatr
             ix*, int*, int*, int*, char*, double*, double*, SLU::SuperMatrix*, SLU::SuperMatrix*, void*, int, SLU::SuperMatrix*, SLU::SuperMatrix*, double*, double*, double*, double*, SLU::GlobalLU_t*, SLU::mem_usage_t*, SLU::SuperLUStat_t*, int*)'
     6311      115 |        GlobalLU_t *, mem_usage_t *, SuperLUStat_t *, int *);
     6312          |        ^~~~~~~~~~~~
     6313    /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp: In member function 'virtual int Amesos_Superlu::Solve()':
  >> 6314    /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp:626:13: error: cannot convert 'SLU::mem_usage_t*' to 'SLU::GlobalLU_t*'
     6315      626 |             &(data_->mem_usage), &SLU_stat, &Ierr);
     6316          |             ^~~~~~~~~~~~~~~~~~~
     6317          |             |
     6318          |             SLU::mem_usage_t*
     6319    In file included from /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-src/packages/amesos/src/Amesos_Superlu.cpp:45:
     6320    /home/perrinel/Dev/spack/opt/spack/linux-ubuntu20.04-skylake/gcc-9.3.0/superlu-5.3.0-glyckr7lylaepepp2kivxgqzqgipdouj/include/slu_ddefs.h:115:8: note:   initializing argument 19 of 'void SLU::dgssvx(SLU::superlu_options_t*, SLU::SuperMatr
             ix*, int*, int*, int*, char*, double*, double*, SLU::SuperMatrix*, SLU::SuperMatrix*, void*, int, SLU::SuperMatrix*, SLU::SuperMatrix*, double*, double*, double*, double*, SLU::GlobalLU_t*, SLU::mem_usage_t*, SLU::SuperLUStat_t*, int*)'
     6321      115 |        GlobalLU_t *, mem_usage_t *, SuperLUStat_t *, int *);
     6322          |        ^~~~~~~~~~~~
  >> 6323    make[2]: *** [packages/amesos/src/CMakeFiles/amesos.dir/build.make:149: packages/amesos/src/CMakeFiles/amesos.dir/Amesos_Superlu.cpp.o] Error 1
     6324    make[2]: *** Waiting for unfinished jobs....
     6325    [ 79%] Building CXX object packages/tpetra/core/src/CMakeFiles/tpetra.dir/Tpetra_CrsMatrix_LONG_LONG_INT_LONG_LONG_SERIAL.cpp.o
     6326    cd /home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-build-eh4maml/packages/tpetra/core/src && /home/perrinel/Dev/spack/opt/spack/linux-ubuntu20.04-skylake/gcc-9.3.0/openmpi-4.
             1.2-dbbo72yb7ys57hq625tytdzczn4s2fym/bin/mpic++ -Dtpetra_EXPORTS -I/home/perrinel/Dev/spack-stage2/spack-stage/spack-stage-trilinos-develop-eh4mamlqrifnbkl2wrh4vhzysyg2u2yo/spack-build-eh4maml -I/home/perrinel/Dev/spack-stage2/spack-stage
             /spack-stage-trilinos

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Feb 10, 2022

FYI: Looking at what SPARC is using with Trilinos right now as shown with their build [Trilinos-atdm-cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt](https://testing.sandia.gov/cdash/build/10577083) shown here

shows the TPLs being used by ATDM:

Final set of enabled TPLs:  BinUtils MPI BLAS LAPACK Boost METIS ParMETIS HDF5 CGNS Netcdf SuperLUDist BoostLib DLlib 13

with SuperLUDist version 5.4.0:

Processing enabled TPL: SuperLUDist (enabled explicitly, disable with -DTPL_ENABLE_SuperLUDist=OFF)
-- SuperLUDist_LIBRARY_NAMES='superludist superlu_dist superlu_dist_2.0 superlu_dist_2.5 superlu_dist_4.0'
-- TPL_SuperLUDist_LIBRARIES='/projects/sparc/tpls/cee-rhel7/superludist-5.4.0/356d811f273c782cfbc9b4babc5ceea642900641/cee-cpu_clang-9.0.1_openmpi-4.0.3/lib64/libsuperlu_dist.a'
-- Searching for headers in SuperLUDist_INCLUDE_DIRS='/projects/sparc/tpls/cee-rhel7/superludist-5.4.0/356d811f273c782cfbc9b4babc5ceea642900641/cee-cpu_clang-9.0.1_openmpi-4.0.3/include'
-- Searching for a header file in the set "superlu_defs.h superludefs.h":
--   Searching for header 'superlu_defs.h' ...
--     Found header '/projects/sparc/tpls/cee-rhel7/superludist-5.4.0/356d811f273c782cfbc9b4babc5ceea642900641/cee-cpu_clang-9.0.1_openmpi-4.0.3/include/superlu_defs.h'
-- Searching for a header file in the set "supermatrix.h":
--   Searching for header 'supermatrix.h' ...
--     Found header '/projects/sparc/tpls/cee-rhel7/superludist-5.4.0/356d811f273c782cfbc9b4babc5ceea642900641/cee-cpu_clang-9.0.1_openmpi-4.0.3/include/supermatrix.h'
-- Found TPL 'SuperLUDist' include dirs '/projects/sparc/tpls/cee-rhel7/superludist-5.4.0/356d811f273c782cfbc9b4babc5ceea642900641/cee-cpu_clang-9.0.1_openmpi-4.0.3/include'
-- TPL_SuperLUDist_INCLUDE_DIRS='/projects/sparc/tpls/cee-rhel7/superludist-5.4.0/356d811f273c782cfbc9b4babc5ceea642900641/cee-cpu_clang-9.0.1_openmpi-4.0.3/include'
-- Performing Test HAVE_SUPERLUDIST_ENUM_NAMESPACE
-- Performing Test HAVE_SUPERLUDIST_ENUM_NAMESPACE - Success
-- Performing Test HAVE_SUPERLUDIST_LUSTRUCTINIT_2ARG
-- Performing Test HAVE_SUPERLUDIST_LUSTRUCTINIT_2ARG - Success

So who is actually using the SuperLU TPL through Trilinos?

@mperrinel
Copy link
Collaborator

@bartlettroscoe Review of the checbox 6.1:

  1. We can see that the order of the libs set in TPL_${tplName}_LIBRARIES is very important because it permits to know the dependencies. Is it possible to define different libs but without any dependencies between them ?
  2. The order of the directories defined TPL_${tplName}_INCLUDE_DIRS is not important.
  3. The target all_libs is created only if TPL_${tplName}_INCLUDE_DIRS is set (even with an empty value) and links to the first library defined in TPL_${tplName}_LIBRARIES or nothing if this variable is empty.

@bartlettroscoe
Copy link
Member Author

@mperrinel

  • I think that the document TribitsExternalPackageWriteConfigFile is good for me. I saw some spelling mistake in the documentation (from the varaibles; for the list of targts given...). I can create a PR to fix them if you want.

Yes, please create a PR with suggested fixes

Is it possible to define different libs but without any dependencies between them ?

It is harmless to define dependencies if there really is none. To be safe, we just need to maintain the ordering of the libs provided on the link line in case that is significant. That is really all the code is doing since we can't actually link between libraries that are already created.

The order of the directories defined TPL_${tplName}_INCLUDE_DIRS is not important.

That is not true. You have to maintain the order provided and CMake will maintain the order of the include dirs that you give it with this list.

The target all_libs is created only if TPL_${tplName}_INCLUDE_DIRS is set (even with an empty value) and links to the first library defined in TPL_${tplName}_LIBRARIES or nothing if this variable is empty.

I don't think that is how it works. The include dirs set on the INTERFACE <tplName>::all_libs that links to all of the libs for a TPL.

@mperrinel
Copy link
Collaborator

@bartlettroscoe
I understand for the order of the TPL_${tplName}_INCLUDE_DIRS
From what I understand in this file TribitsExternalPackageWriteConfigFile.cmake, the target all_libs is always created :

# C) Create the <tplName>::all_libs target
  tribits_external_package_create_all_libs_target(

So I don't really understand why is it nos always visible on the tests. What is the condition to write the all_libs target ? Why is not visible in this test : function(unittest_tribits_external_package_process_libraries_list_incl_dirs_0_lib_files_3)

@bartlettroscoe
Copy link
Member Author

Why is not visible in this test : function(unittest_tribits_external_package_process_libraries_list_incl_dirs_0_lib_files_3)

@mperrinel, because that unit test just calls the function tribits_external_package_process_libraries_list() which just processes the list of libraries in TPL_${tplName}_LIBRARIES. It does not create the ${tplName}::all_libs target. It is a unit test.

@mperrinel
Copy link
Collaborator

@bartlettroscoe ok, thanks, all make sense now.

@mperrinel
Copy link
Collaborator

@bartlettroscoe I opened the PR for 6.1 spelling mistakes

@mperrinel
Copy link
Collaborator

mperrinel commented Feb 16, 2022

@bartlettroscoe @keitat Progress on 5.2:
I built Trilinos using Superlu-dist (available variant of spack Trilinos) and I didn't add any other Superlu. In addition I disabled, as suggested SuperLu for Amesos : -D Amesos_ENABLE_SuperLU=OFF

This is my final list of Enabled TPLs :
Final set of enabled TPLs: Pthread HWLOC MPI BLAS LAPACK Boost Scotch METIS ParMETIS Zlib HDF5 CGNS Pnetcdf Netcdf SuperLUDist BoostLib Matio DLlib 18

The compilation ended with success, but not the ctest done by make dashboard :
CDash Report
Result : 1 error and 10 tests failures with Zoltan

bartlettroscoe added a commit that referenced this issue Feb 16, 2022
@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Feb 16, 2022

Result : 1 error and 10 tests failures with Zoltan

@mperrinel, can you dig into these errors some and see if you can post Trilinos GitHub issues for that build on CDash? The reproducibility instructions should involve the branch of spack that you are using to generate these results. (That is the whole point of this exercise, to allow consistent reproductions of builds of Trilinos.)

Do you know how to use the the "Test Output" filter with the CDash queryTests.php page to characterize the test failures quickly and compactly for this build? See some example of this in some initial analysis of the failures below.

Initial analysis of Zoltan test failures (click to expand)

.

First, let's start with all of the failing tests for this build:

showing:

image

With a little poking around and using the filter ["Test Output", "include", "Diff failed"] with the query:

showing:

image

you can see that 8 of the 9 failing Zoltan tests are diffs. If you click "Show Matching Output" on the upper right, you can see the details (or at least the context) of the diffs a little.

And the query:

showing:

image

shows that that one Zoltan test Zoltan_hg_cage10_zoltan_parallel fails for a reason other than a diff.

The query:

showing:

image

shows that one test is failing due to "Missing output files".

The problem with that latter test Zoltan_hg_cage10_zoltan_parallel is:

There are not enough slots available in the system to satisfy the 11
slots that were requested by the application:

  ../zdrive.exe

Either request fewer slots for your application, or make more slots
available for use.

It looks like that error is coming from OpenMPI mpirun itself as per:

I think you need the argument --oversubscribe to your mpirun invocation. See the cache var MPI_EXEC_PRE_NUMPROCS_FLAGS at:

This is getting detailed enough that we should likely create a Trilinos GitHub issue for these failures and move the discussion there.

@keitat
Copy link
Collaborator

keitat commented Feb 17, 2022

-D Amesos_ENABLE_SuperLU=OFF and use any version of SuperLU that works (i.e. newest version which
Amesos2 can handle)

package.py for Trilinos does not provide the equivalent option. Quick workaround is adding a few python statements for this option.

Need a process to automate package.py generation.

@bartlettroscoe
Copy link
Member Author

Need a process to automate to create package.py generation.

For now, you can just create a branch in the spack repo and make the changes right there.

@keitat
Copy link
Collaborator

keitat commented Feb 17, 2022

I will just add a few statements and do PR to stack team (or use my own Spack fork). @mperrinel I will let you know as soon as it is ready.

@mperrinel
Copy link
Collaborator

Thanks @keitat
@bartlettroscoe I created an issue on the GitHub or Trilinos:
trilinos/Trilinos#10235

@keitat
Copy link
Collaborator

keitat commented Mar 3, 2022

With Spack we can install Trilinos without Amesos. However, I recommend building Trilinos without Spack. @bartlettroscoe , what Trilinos packages do you activate?
spack install trilinos@13.2.0~amesos+superlu

@bartlettroscoe
Copy link
Member Author

@bartlettroscoe , what Trilinos packages do you activate?

Just about all of them (or at least all of them that are enabled in Trilinos PR testing).

@bartlettroscoe
Copy link
Member Author

At this point I think we should consider abandoning the Spack build of TPLs for Trilinos to allow NGA to reproduce Trilinos builds and tests with updated TriBITS versions. Instead, we should descope and just have NGA run TriBITS test suite and review TriBITS PRs and Issues (and provide comments for related Trilinos PRs).

Let's discuss at the meeting today.

@bartlettroscoe
Copy link
Member Author

NOTE: Trilinos PR testing is using SuperLU 4.3. For example, see here showing:

Processing enabled TPL: SuperLU (enabled explicitly, disable with -DTPL_ENABLE_SuperLU=OFF)
<...>
--   Searching for lib 'superlu' ...
--     Found lib '/projects/sems/install/rhel7-x86_64/sems/tpl/superlu/4.3/clang/10.0.0/base/lib/libsuperlu.a'
-- TPL_SuperLU_LIBRARIES='/projects/sems/install/rhel7-x86_64/sems/tpl/superlu/4.3/clang/10.0.0/base/lib/libsuperlu.a'
-- Searching for headers in SuperLU_INCLUDE_DIRS='/projects/sems/install/rhel7-x86_64/sems/tpl/superlu/4.3/clang/10.0.0/base/include'
<...>
-- Searching for a header file in the set "supermatrix.h":
--   Searching for header 'supermatrix.h' ...
--     Found header '/projects/sems/install/rhel7-x86_64/sems/tpl/superlu/4.3/clang/10.0.0/base/include/supermatrix.h'
-- Searching for a header file in the set "slu_ddefs.h":
--   Searching for header 'slu_ddefs.h' ...
--     Found header '/projects/sems/install/rhel7-x86_64/sems/tpl/superlu/4.3/clang/10.0.0/base/include/slu_ddefs.h'
-- Found TPL 'SuperLU' include dirs '/projects/sems/install/rhel7-x86_64/sems/tpl/superlu/4.3/clang/10.0.0/base/include'
-- TPL_SuperLU_INCLUDE_DIRS='/projects/sems/install/rhel7-x86_64/sems/tpl/superlu/4.3/clang/10.0.0/base/include'
<...>

But we should abandon this effort and just skip NGA testing with Trilinos.*

@bartlettroscoe
Copy link
Member Author

@MikolajZuzek, please look at documentation and files listed above.

@mperrinel
Copy link
Collaborator

mperrinel commented Mar 14, 2022

@bartlettroscoe @keitat. Please find attached the documentation requested
TrilinosBuildWithSpack.md

@bartlettroscoe
Copy link
Member Author

@mperrinel, there are some formatting problems with the rendered Markdown (as per the Markdown Viewer Chrome Extension). It shows:

image

Can you fix the formatting?

@mperrinel
Copy link
Collaborator

Ok @bartlettroscoe, it is updated

@bartlettroscoe
Copy link
Member Author

Let's call this done

TriBITS Refactor automation moved this from In Progress to Done Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

5 participants