Update CMake build of ExaTensor #1

amccaskey · 2019-07-18T19:01:39Z

We need to update tpls/CMakeLists.txt to provide a more robust build of the ExaTensor submodule.

amccaskey · 2019-07-20T16:56:51Z

Been thinking about this. We need to set it up so that CMake will use the ExaTensor environment variables by default, i.e. you prepend your cmake command with BLASLIB=... cmake .., for example. You can do this with the CMake $ENV{BLASLIB} syntax. So check if its available, and if not, use defaults for the given platform.

osbornjd · 2019-07-22T17:43:54Z

Okay, so I guess this is more or less what is already done for the OSX build if I understand correctly. Currently in the ExaTensor CMake file the OSX build grabs the e.g. CPP compiler from an environment variable in these lines to use in the build, whereas in the linux targets in e.g. this line the values are hard coded in. So all of these hard coded variables should have some reference variable similar to the current OSX target lines.

amccaskey · 2019-07-22T17:48:06Z

Yes that's right.

DmitryLyakh · 2019-07-23T14:56:14Z

From what I see now in the build system, the ExaTensor MPI path is imported from CMake, which is good. However, the BLAS library path in general includes more than one path, depending on the BLAS library implementation. ExaTensor currently supports four different BLAS libraries: ATLAS, MKL, ACML, and ESSL. Each of them requires its own way to set either one or multiple paths for linking. Ideally, we need to import this information from CMAKE as well and then pass to ExaTensor. Otherwise, we will have to configure ExaTensor separately, risking to mix different BLAS libraries (e.g., ExaTensor built with MKL while exatn linking with ATLAS).

DmitryLyakh · 2019-07-23T15:02:13Z

Also, I think it is better to make EXA_TALSH_ONLY=YES by default in exatn build system. I sent an email last week with the following suggestion for differrnt build options (from the simplest to the hardest):

Single-node, no MPI (i.e, no symbolic interface), TALSH only.
Single-node, MPI (i.e., symbolic interface on as well), TALSH only.
Multi-node, MPI (i.e., symbolic interface on as well), full ExaTensor.
The bottomline is that in the majority of cases (C++/Python workstation users), we do not need MPI at all and can restrict ExaTensor to TALSH only. This will eliminate enoumous headaches with people complaining about installing their MPI implementation and picking the right compilers.

DmitryLyakh · 2019-07-23T15:07:16Z

Implementation-wise, I would suggest the following:

If the user does not explicitly choose the MPI library during CMAKE configure, assume no MPI at all.
If the user does not explicitly specify BLAS environment variables for ExaTensor, try to import the necessary BLAS paths from CMAKE. If unable, set BLASLIB=NONE.

osbornjd · 2019-07-23T15:12:13Z

Okay, I think this makes sense - the issue is ensuring that user defined implementations of MPI/BLAS don't conflict with each other in exatn and exatensor. Additionally, it should be ensured that user definitions don't conflict with default CMake assumptions.

The bottomline is that in the majority of cases (C++/Python workstation users), we do not need MPI at all and can restrict ExaTensor to TALSH only. This will eliminate enoumous headaches with people complaining about installing their MPI implementation and picking the right compilers.

I certainly agree with this as when I was starting last week trying to build on my OSX environment part of the struggles of building was ensuring that the compiler between MPI and exatn were matching.

Implementation-wise, I would suggest the following:

If the user does not explicitly choose the MPI library during CMAKE configure, assume no MPI at all.

If the user does not explicitly specify BLAS environment variables for ExaTensor, try to import the necessary BLAS paths from CMAKE. If unable, set BLASLIB=NONE.

I will work on this implementation. Additionally we should add this in the readme so that it is transparent to the user what the CMake configuration is or is not assuming if environment variables are or aren't declared.

osbornjd · 2019-07-23T17:22:48Z

If the user does not explicitly choose the MPI library during CMAKE configure, assume no MPI at all.

By the way, at the moment the build requires MPI and BLAS to be found, see this line. I assume then that this can be changed in order to implement your suggestion? I'm not sure if these lines were put here for a historical reason, or something else.

amccaskey · 2019-07-23T17:25:48Z

You can remove the REQUIRED arg on those

amccaskey · 2019-07-23T17:26:24Z

But look through the other CMakeLists files and make sure that if something like MPI_CXX_INCLUDE_DIRS is used you update to wrap it with if(MPI_FOUND) or something like that

osbornjd · 2019-07-23T17:27:39Z

Right, this is why I asked - removing these lines completely for example causes the build to fail likely for the reasons you suggest.

DmitryLyakh · 2019-07-30T22:55:34Z

In branch devel_dil, I plugged in the TAL-SH backend to the tensor runtime package and most tests failed the linking step because our CMAKE files are not configured to address the dependencies of the TAL-SH library: CUDA libraries (if CUDA is enabled) and OpenMP runtime library (compiler specific). The devel_dil branch currently does not build. This is a stopper for me as I can't continue development until this works. We need to somehow resolve the systematic misconfiguration of CMAKE files throughout. I began fixing CUDA libraries part, but realized that I have no idea how to add OpenMP runtime library which is compiler specific. I see in a number of places where we hardcoded libgomp, but this is GNU only and will fail with other compilers. Any ideas how to fix this systematically throughout would be appreciated.

amccaskey · 2019-07-30T23:10:28Z

I wouldn't call this a misconfiguration of CMake. This is a new addition, we just have to update the CMake build system to enable it. There is no systematic way to avoid these issues. We just have to implement the CMake code to enable new additions to the build as the come online.

As for OpenMP, cmake provides find_package(OpenMP). In order to see a description of this module, or any other module, you can run

cmake --help-module FindOpenMP

@osbornjd can help address these link issues tomorrow.

DmitryLyakh · 2019-07-30T23:33:45Z

I see, ok, will try tomorrow again (with openmp).

amccaskey · 2019-07-31T00:54:51Z

One issue we'll have to figure out is that under this model, we have introduced a cyclic dependency: numerics depends on runtime (num_server -> tensor_runtime), but runtime also depends on numerics (graph -> tensor_operation).

One proposal to break this would be to move Numerics Server to the exatn package.

amccaskey · 2019-07-31T01:01:29Z

I say this because I think this is whats causing the errors on devel_dil:

[ 98%] Linking CXX executable NumericsTester
../libexatn-numerics.so: undefined reference to `exatn::runtime::TensorRuntime::~TensorRuntime()'
../libexatn-numerics.so: undefined reference to `exatn::runtime::TensorRuntime::submit(std::shared_ptr<exatn::numerics::TensorOperation>)'
../libexatn-numerics.so: undefined reference to `exatn::runtime::TensorRuntime::sync(exatn::numerics::TensorOperation&, bool)'
../libexatn-numerics.so: undefined reference to `exatn::runtime::TensorRuntime::TensorRuntime(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
../libexatn-numerics.so: undefined reference to `exatn::runtime::TensorRuntime::sync(exatn::numerics::Tensor const&, bool)'
collect2: error: ld returned 1 exit status
src/numerics/tests/CMakeFiles/NumericsTester.dir/build.make:92: recipe for target 'src/numerics/tests/NumericsTester' failed
make[2]: *** [src/numerics/tests/NumericsTester] Error 1
CMakeFiles/Makefile2:1908: recipe for target 'src/numerics/tests/CMakeFiles/NumericsTester.dir/all' failed
make[1]: *** [src/numerics/tests/CMakeFiles/NumericsTester.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Adding target_link_libraries(${LIBRARY_NAME} PUBLIC exatn-runtime) to numerics CMakeLists.txt leads to

-- Configuring done
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
  "exatn-numerics" of type SHARED_LIBRARY
    depends on "exatn-runtime" (weak)
  "exatn-runtime" of type SHARED_LIBRARY
    depends on "exatn-numerics" (weak)
At least one of these targets is not a STATIC_LIBRARY.  Cyclic dependencies are allowed only among static libraries.
-- Build files have been written to: /home/cades/dev/exatn/build
Makefile:936: recipe for target 'cmake_check_build_system' failed
make: *** [cmake_check_build_system] Error 1

DmitryLyakh · 2019-07-31T01:36:08Z

Yes, agree, we need to move NumericsServer from numerics to exatn. Then NumericsServer will properly depend on numerics and runtime packages.

amccaskey added bug Something isn't working enhancement New feature or request labels Jul 18, 2019

amccaskey assigned amccaskey and osbornjd Jul 18, 2019

osbornjd mentioned this issue Jul 24, 2019

Adjusted build for BLAS and MPI implementation based on user specification #4

Merged

DmitryLyakh mentioned this issue Jul 30, 2019

exatn::getService failure #7

Closed

DmitryLyakh closed this as completed Aug 4, 2019

This was referenced Jul 15, 2020

Error thrown when trying to build ExaTn using Intel compilers. #37

Open

ExaTn Python Example Throws Undefined Symbol error. #38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update CMake build of ExaTensor #1

Update CMake build of ExaTensor #1

amccaskey commented Jul 18, 2019

amccaskey commented Jul 20, 2019

osbornjd commented Jul 22, 2019

amccaskey commented Jul 22, 2019

DmitryLyakh commented Jul 23, 2019

DmitryLyakh commented Jul 23, 2019

DmitryLyakh commented Jul 23, 2019

osbornjd commented Jul 23, 2019

osbornjd commented Jul 23, 2019

amccaskey commented Jul 23, 2019

amccaskey commented Jul 23, 2019

osbornjd commented Jul 23, 2019

DmitryLyakh commented Jul 30, 2019

amccaskey commented Jul 30, 2019

DmitryLyakh commented Jul 30, 2019

amccaskey commented Jul 31, 2019

amccaskey commented Jul 31, 2019

DmitryLyakh commented Jul 31, 2019

Update CMake build of ExaTensor #1

Update CMake build of ExaTensor #1

Comments

amccaskey commented Jul 18, 2019

amccaskey commented Jul 20, 2019

osbornjd commented Jul 22, 2019

amccaskey commented Jul 22, 2019

DmitryLyakh commented Jul 23, 2019

DmitryLyakh commented Jul 23, 2019

DmitryLyakh commented Jul 23, 2019

osbornjd commented Jul 23, 2019

osbornjd commented Jul 23, 2019

amccaskey commented Jul 23, 2019

amccaskey commented Jul 23, 2019

osbornjd commented Jul 23, 2019

DmitryLyakh commented Jul 30, 2019

amccaskey commented Jul 30, 2019

DmitryLyakh commented Jul 30, 2019

amccaskey commented Jul 31, 2019

amccaskey commented Jul 31, 2019

DmitryLyakh commented Jul 31, 2019