Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CMake build of ExaTensor #1

Closed
amccaskey opened this issue Jul 18, 2019 · 17 comments
Closed

Update CMake build of ExaTensor #1

amccaskey opened this issue Jul 18, 2019 · 17 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@amccaskey
Copy link
Member

We need to update tpls/CMakeLists.txt to provide a more robust build of the ExaTensor submodule.

@amccaskey amccaskey added bug Something isn't working enhancement New feature or request labels Jul 18, 2019
@amccaskey
Copy link
Member Author

Been thinking about this. We need to set it up so that CMake will use the ExaTensor environment variables by default, i.e. you prepend your cmake command with BLASLIB=... cmake .., for example. You can do this with the CMake $ENV{BLASLIB} syntax. So check if its available, and if not, use defaults for the given platform.

@osbornjd
Copy link

Okay, so I guess this is more or less what is already done for the OSX build if I understand correctly. Currently in the ExaTensor CMake file the OSX build grabs the e.g. CPP compiler from an environment variable in these lines to use in the build, whereas in the linux targets in e.g. this line the values are hard coded in. So all of these hard coded variables should have some reference variable similar to the current OSX target lines.

@amccaskey
Copy link
Member Author

Yes that's right.

@DmitryLyakh
Copy link
Member

From what I see now in the build system, the ExaTensor MPI path is imported from CMake, which is good. However, the BLAS library path in general includes more than one path, depending on the BLAS library implementation. ExaTensor currently supports four different BLAS libraries: ATLAS, MKL, ACML, and ESSL. Each of them requires its own way to set either one or multiple paths for linking. Ideally, we need to import this information from CMAKE as well and then pass to ExaTensor. Otherwise, we will have to configure ExaTensor separately, risking to mix different BLAS libraries (e.g., ExaTensor built with MKL while exatn linking with ATLAS).

@DmitryLyakh
Copy link
Member

Also, I think it is better to make EXA_TALSH_ONLY=YES by default in exatn build system. I sent an email last week with the following suggestion for differrnt build options (from the simplest to the hardest):

  1. Single-node, no MPI (i.e, no symbolic interface), TALSH only.
  2. Single-node, MPI (i.e., symbolic interface on as well), TALSH only.
  3. Multi-node, MPI (i.e., symbolic interface on as well), full ExaTensor.
    The bottomline is that in the majority of cases (C++/Python workstation users), we do not need MPI at all and can restrict ExaTensor to TALSH only. This will eliminate enoumous headaches with people complaining about installing their MPI implementation and picking the right compilers.

@DmitryLyakh
Copy link
Member

Implementation-wise, I would suggest the following:

  1. If the user does not explicitly choose the MPI library during CMAKE configure, assume no MPI at all.
  2. If the user does not explicitly specify BLAS environment variables for ExaTensor, try to import the necessary BLAS paths from CMAKE. If unable, set BLASLIB=NONE.

@osbornjd
Copy link

Okay, I think this makes sense - the issue is ensuring that user defined implementations of MPI/BLAS don't conflict with each other in exatn and exatensor. Additionally, it should be ensured that user definitions don't conflict with default CMake assumptions.

The bottomline is that in the majority of cases (C++/Python workstation users), we do not need MPI at all and can restrict ExaTensor to TALSH only. This will eliminate enoumous headaches with people complaining about installing their MPI implementation and picking the right compilers.

I certainly agree with this as when I was starting last week trying to build on my OSX environment part of the struggles of building was ensuring that the compiler between MPI and exatn were matching.

Implementation-wise, I would suggest the following:

  1. If the user does not explicitly choose the MPI library during CMAKE configure, assume no MPI at all.
  2. If the user does not explicitly specify BLAS environment variables for ExaTensor, try to import the necessary BLAS paths from CMAKE. If unable, set BLASLIB=NONE.

I will work on this implementation. Additionally we should add this in the readme so that it is transparent to the user what the CMake configuration is or is not assuming if environment variables are or aren't declared.

@osbornjd
Copy link

  1. If the user does not explicitly choose the MPI library during CMAKE configure, assume no MPI at all.

By the way, at the moment the build requires MPI and BLAS to be found, see this line. I assume then that this can be changed in order to implement your suggestion? I'm not sure if these lines were put here for a historical reason, or something else.

@amccaskey
Copy link
Member Author

You can remove the REQUIRED arg on those

@amccaskey
Copy link
Member Author

But look through the other CMakeLists files and make sure that if something like MPI_CXX_INCLUDE_DIRS is used you update to wrap it with if(MPI_FOUND) or something like that

@osbornjd
Copy link

Right, this is why I asked - removing these lines completely for example causes the build to fail likely for the reasons you suggest.

@DmitryLyakh
Copy link
Member

In branch devel_dil, I plugged in the TAL-SH backend to the tensor runtime package and most tests failed the linking step because our CMAKE files are not configured to address the dependencies of the TAL-SH library: CUDA libraries (if CUDA is enabled) and OpenMP runtime library (compiler specific). The devel_dil branch currently does not build. This is a stopper for me as I can't continue development until this works. We need to somehow resolve the systematic misconfiguration of CMAKE files throughout. I began fixing CUDA libraries part, but realized that I have no idea how to add OpenMP runtime library which is compiler specific. I see in a number of places where we hardcoded libgomp, but this is GNU only and will fail with other compilers. Any ideas how to fix this systematically throughout would be appreciated.

@amccaskey
Copy link
Member Author

I wouldn't call this a misconfiguration of CMake. This is a new addition, we just have to update the CMake build system to enable it. There is no systematic way to avoid these issues. We just have to implement the CMake code to enable new additions to the build as the come online.

As for OpenMP, cmake provides find_package(OpenMP). In order to see a description of this module, or any other module, you can run

cmake --help-module FindOpenMP

@osbornjd can help address these link issues tomorrow.

@DmitryLyakh
Copy link
Member

I see, ok, will try tomorrow again (with openmp).

@amccaskey
Copy link
Member Author

One issue we'll have to figure out is that under this model, we have introduced a cyclic dependency: numerics depends on runtime (num_server -> tensor_runtime), but runtime also depends on numerics (graph -> tensor_operation).

One proposal to break this would be to move Numerics Server to the exatn package.

@amccaskey
Copy link
Member Author

I say this because I think this is whats causing the errors on devel_dil:

[ 98%] Linking CXX executable NumericsTester
../libexatn-numerics.so: undefined reference to `exatn::runtime::TensorRuntime::~TensorRuntime()'
../libexatn-numerics.so: undefined reference to `exatn::runtime::TensorRuntime::submit(std::shared_ptr<exatn::numerics::TensorOperation>)'
../libexatn-numerics.so: undefined reference to `exatn::runtime::TensorRuntime::sync(exatn::numerics::TensorOperation&, bool)'
../libexatn-numerics.so: undefined reference to `exatn::runtime::TensorRuntime::TensorRuntime(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
../libexatn-numerics.so: undefined reference to `exatn::runtime::TensorRuntime::sync(exatn::numerics::Tensor const&, bool)'
collect2: error: ld returned 1 exit status
src/numerics/tests/CMakeFiles/NumericsTester.dir/build.make:92: recipe for target 'src/numerics/tests/NumericsTester' failed
make[2]: *** [src/numerics/tests/NumericsTester] Error 1
CMakeFiles/Makefile2:1908: recipe for target 'src/numerics/tests/CMakeFiles/NumericsTester.dir/all' failed
make[1]: *** [src/numerics/tests/CMakeFiles/NumericsTester.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Adding target_link_libraries(${LIBRARY_NAME} PUBLIC exatn-runtime) to numerics CMakeLists.txt leads to

-- Configuring done
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
  "exatn-numerics" of type SHARED_LIBRARY
    depends on "exatn-runtime" (weak)
  "exatn-runtime" of type SHARED_LIBRARY
    depends on "exatn-numerics" (weak)
At least one of these targets is not a STATIC_LIBRARY.  Cyclic dependencies are allowed only among static libraries.
-- Build files have been written to: /home/cades/dev/exatn/build
Makefile:936: recipe for target 'cmake_check_build_system' failed
make: *** [cmake_check_build_system] Error 1

@DmitryLyakh
Copy link
Member

Yes, agree, we need to move NumericsServer from numerics to exatn. Then NumericsServer will properly depend on numerics and runtime packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants