Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{bio}[foss/2022a] AlphaFold v2.3.1, HH-suite v3.3.0, Kalign v3.3.5, OpenMM 8.0.0 w/ Python 3.10.4 #17604

Merged
merged 13 commits into from Apr 14, 2023

Conversation

maxim-masterov
Copy link
Collaborator

(created using eb --new-pr)

@orbsmiv
Copy link
Contributor

orbsmiv commented Mar 27, 2023

@maxim-masterov unless there's a strong reason not to, can we bump the AlphaFold version to the current 2.3.1?
https://github.com/deepmind/alphafold/releases/tag/v2.3.1

@maxim-masterov
Copy link
Collaborator Author

@orbsmiv I've updated it to v2.3.1. Also, added a patch to PDBFixer to make it compatible with old OpenMM-7.5.1

@maxim-masterov maxim-masterov changed the title {bio}[foss/2022a] AlphaFold v2.3.0, HH-suite v3.3.0, Kalign v3.3.5, ... w/ Python 3.10.4 {bio}[foss/2022a] AlphaFold v2.3.0, HH-suite v3.3.0, Kalign v3.3.5 w/ Python 3.10.4 Mar 28, 2023
@branfosj branfosj changed the title {bio}[foss/2022a] AlphaFold v2.3.0, HH-suite v3.3.0, Kalign v3.3.5 w/ Python 3.10.4 {bio}[foss/2022a] AlphaFold v2.3.1, HH-suite v3.3.0, Kalign v3.3.5 w/ Python 3.10.4 Mar 28, 2023
@boegel boegel added the update label Mar 28, 2023
@boegel boegel added this to the 4.x milestone Mar 28, 2023
@easybuilders easybuilders deleted a comment from boegelbot Mar 28, 2023
@orbsmiv
Copy link
Contributor

orbsmiv commented Mar 29, 2023

Test report by @orbsmiv
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
bear-pg0105u03a.bear.cluster - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/orbsmiv/defcaf0c0806917e160069c56adc5eb3 for a full test report.

('UCX-CUDA', '1.12.1', versionsuffix),
('cuDNN', '8.4.1.50', versionsuffix, SYSTEM),
('NCCL', '2.12.12', versionsuffix),
('OpenMM', '8.0.0'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxim-masterov Doesn't OpenMM need to be GPU-capable for AlphaFold?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boegel I'm checking it. I didn't include OpenMM compiled with CUDA support as previous easyconfigs didn't use it. I'm trying to build it now, but as you mentioned earlier, I hit some errors. At the moment it looks like CMake picks nvcc from /usr/bin, instead of $EBCUDAROOT/bin

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boegel I've added CUDA to OpenMM, It builds fine and passes all tests after specifying the OPENMM_CUDA_COMPILER variable. Without this variable, OpenMM tries to use /usr/local/cuda/bin/nvcc, instead of ${EBROOTCUDA}/bin/nvcc. The only test that fails with this variable is CudaCompiler, which requires OPENMM_CUDA_COMPILER not to be set, so I've excluded it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a try, as we also had some requests for this new AlphaFold version, but I do get an internal compiler error as well:

/dev/shm/f115372/OpenMM/8.0.0/foss-2022a/openmm-8.0.0/platforms/common/src/CommonKernels.cpp: In member function void OpenMM::CommonCalcGayBerneForceKernel::sortAtoms():
/dev/shm/f115372/OpenMM/8.0.0/foss-2022a/openmm-8.0.0/platforms/common/src/CommonKernels.cpp:5055:6: internal compiler error: in vect_get_vec_defs_for_operand, at tree-vect-stmts.c:1450
 5055 | void CommonCalcGayBerneForceKernel::sortAtoms() {
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0x69b359 vect_get_vec_defs_for_operand(vec_info*, _stmt_vec_info*, unsigned int, tree_node*, vec<tree_node*, va_heap, vl_ptr>*, tree_node*)
        ../../gcc/tree-vect-stmts.c:1450
0xf42df4 vect_build_gather_load_calls
        ../../gcc/tree-vect-stmts.c:2728
0xf42df4 vectorizable_load
        ../../gcc/tree-vect-stmts.c:8718
0xf4bca0 vect_transform_stmt(vec_info*, _stmt_vec_info*, gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
        ../../gcc/tree-vect-stmts.c:10922
0xf4fa41 vect_transform_loop_stmt
        ../../gcc/tree-vect-loop.c:9254
0xf6740d vect_transform_loop(_loop_vec_info*, gimple*)
        ../../gcc/tree-vect-loop.c:9690
0xf9059c try_vectorize_loop_1
        ../../gcc/tree-vectorizer.c:1104
0xf91181 vectorize_loops()
        ../../gcc/tree-vectorizer.c:1243
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odd, what optarch flags do you use? and what machine are you building on?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't set any optarch, so it was using -march=native, and this was on an AMD EPYC 7763. Also tried OpenMM 7.7.0 with CUDA and foss 2022a, but that resulted in the same error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built succesfully on Intel Platinum 8360Y with optarch=Intel:O2 -march=core-avx2;GCC:O2 -mavx2 -mfma and NVIDIA A100. And commenting out CUDA allowed me to build on AMD EPYC 7H12 without GPUs on board.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was experimenting with compiler flags and managed to get the same "internal compiler" error. It appears when -march is used, which switches on the tree vectorization. Surprisingly, with -march=znver2 on our AMD CPUs all works fine, but with, e.g., -march=skylake-avx512 on our Intel CPUs I get this error.

There are two ways to solve it, except patching the compiler, which will require shrinking down the code to a small reproducible example and posting an issue on Bugzilla. The first is to compile OpenMM with -fno-tree-vectorize flag. The second is to patch platforms/common/src/CommonKernels.cpp file and add __attribute__((optimize("no-tree-vectorize"))) in front of the CommonCalcGayBerneForceKernel::sortAtoms() function definition. IMO, the second is better, as it affects only one function, instead of the whole source code.

Copy link
Contributor

@lexming lexming Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that just removing -ftree-vectorize is enough to workaround this ICE. Using -march=native will only be an issue on those architectures affected by this bug.

On my side,I hit this ICE on our old Intel Broadwells with just AVX2. However, the build on our AMD EPYC 7282 (znver2) worked fine and those also just support AVX2.

So maybe we can collect a few known systems where this ICE triggers and update the comment OpenMM-8.0.0_add_no_tree_vectorize.patch to inform that this patch is not always needed.

@orbsmiv orbsmiv mentioned this pull request Mar 31, 2023
6 tasks
@bedroge
Copy link
Contributor

bedroge commented Apr 7, 2023

Test report by @bedroge
SUCCESS
Build succeeded for 7 out of 7 (4 easyconfigs in total)
a100gpu6 - Linux Rocky Linux 8.7, x86_64, Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (icelake), 4 x NVIDIA NVIDIA A100-PCIE-40GB, 515.65.01, Python 3.6.8
See https://gist.github.com/bedroge/3f275fc4100b32e13b434de1a68f369b for a full test report.

@bedroge
Copy link
Contributor

bedroge commented Apr 11, 2023

One other question, how certain are you / can we be about the compatibility of OpenMM 8 with AlphaFold? I read in the issue that @boegel referred to in this PR that the developer of OpenMM confirms that 7.7.0 is backwards compatible and should work fine, but is this also true for 8.0.0?

@boegelbot

This comment was marked as outdated.

@maxim-masterov
Copy link
Collaborator Author

@bedroge According to the OpenMM developers, the only reason why AlphaFold doesn't work with OpenMM >=7.7.0 is some refactoring they did with respect to the Python wrappers. All the functionality is back-compatible. I've asked one of our users to test the installation performed with easyconfigs from this PR, I'll update this thread as soon as he confirms that all works well (or not :) )

@easybuilders easybuilders deleted a comment from boegelbot Apr 11, 2023
@easybuilders easybuilders deleted a comment from boegelbot Apr 11, 2023
@boegel
Copy link
Member

boegel commented Apr 13, 2023

Test report by @boegel
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node3304.joltik.os - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 530.30.02, Python 3.6.8
See https://gist.github.com/boegel/98751425d948626ce9de0ee18acec970 for a full test report.

@bedroge
Copy link
Contributor

bedroge commented Apr 13, 2023

Test report by @bedroge
FAILED
Build succeeded for 3 out of 4 (4 easyconfigs in total)
node1 - Linux Rocky Linux 8.7, x86_64, AMD EPYC 7763 64-Core Processor (zen3), Python 3.6.8
See https://gist.github.com/bedroge/4025ee53e861939bac6c4b2b74022ac8 for a full test report.

Can be ignored, accidentally ran it on the wrong machine without a GPU...

@maxim-masterov
Copy link
Collaborator Author

@bedroge did it fail because it was building OpenMM on a node without GPU?

@bedroge
Copy link
Contributor

bedroge commented Apr 13, 2023

Ah, yes, sorry, ran it in the wrong tab. Will try again on a GPU node.

@bedroge
Copy link
Contributor

bedroge commented Apr 14, 2023

Test report by @bedroge
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
a100gpu6 - Linux Rocky Linux 8.7, x86_64, Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (icelake), 4 x NVIDIA NVIDIA A100-PCIE-40GB, 515.65.01, Python 3.6.8
See https://gist.github.com/bedroge/6435e574b3f9eb475cb7c258dd466667 for a full test report.

@lexming
Copy link
Contributor

lexming commented Apr 14, 2023

Test report by @lexming
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node402.hydra.os - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7282 16-Core Processor (zen2), 1 x NVIDIA NVIDIA A100-PCIE-40GB, 515.48.07, Python 3.6.8
See https://gist.github.com/lexming/5213437931009ac17a7deecb4ff3a6ad for a full test report.

@boegel
Copy link
Member

boegel commented Apr 14, 2023

Test report by @boegel
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node3905.accelgor.os - Linux RHEL 8.6, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 530.30.02, Python 3.6.8
See https://gist.github.com/boegel/81d34a05059c772dbdc6abfe020b4491 for a full test report.

@boegel boegel modified the milestones: 4.x, next release (4.7.2) Apr 14, 2023
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Member

boegel commented Apr 14, 2023

Going in, thanks @maxim-masterov!

@boegel boegel merged commit 7ca4834 into easybuilders:develop Apr 14, 2023
10 checks passed
@boegel
Copy link
Member

boegel commented Apr 14, 2023

Test report by @boegel
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node3905.accelgor.os - Linux RHEL 8.6, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 530.30.02, Python 3.6.8
See https://gist.github.com/boegel/af6db782b145e0298c28181a01fc814e for a full test report.

@boegel boegel changed the title {bio}[foss/2022a] AlphaFold v2.3.1, HH-suite v3.3.0, Kalign v3.3.5 w/ Python 3.10.4 {bio}[foss/2022a] AlphaFold v2.3.1, HH-suite v3.3.0, Kalign v3.3.5, OpenMM 8.0.0 w/ Python 3.10.4 May 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants