Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems with parallelization in v 23.1 (compiled through conda-forge channel) #1365

Open
gekbuccella opened this issue Jan 3, 2024 · 1 comment

Comments

@gekbuccella
Copy link

Describe the bug
Molecular Dynamics calculations with conda-forge version of dftbplus 23.1 are slow or get stuck when using "mpirun".

-with openmpi version, the problem might be avoided by diminishing the number of requested cores, but this behaviour seems to be random;
-with mpich version, the calculations carry on but very slowly;
-by launching in serial (i.e. without "mpirun"), the calculations seem to go fine.

To Reproduce
Installation steps:

#conda create --name dftbplus23.1
#conda activate dftbplus23.1
#conda install --name dftbplus23.1 mamba
#mamba install --name dftbplus23.1 -c conda-forge dftbplus=23.1=mpi_*_*
#mamba install --name dftbplus23.1 -c conda-forge dftbplus-tools=23.1
#mamba install --name dftbplus23.1 -c conda-forge dftbplus-python=23.1

job script used to run calculations:

#!/bin/bash
#SBATCH --account ttd
#SBATCH --partition ttdFast
#SBATCH --qos ttd
#
#SBATCH -J ER-400
#SBATCH -o ./out.%j.log
#SBATCH -e ./err.%j.log
#SBATCH -D ./
#SBATCH --mail-type=NONE
#SBATCH --mail-user=giacomo.buccella@rse-web.it
#
#SBATCH --get-user-env
#SBATCH --nodes=1
#SBATCH --ntasks=6
#SBATCH --ntasks-per-core=1

cd /home/TTD/giacomo/MD-ER/temperature-tests/400K

module purge

export OMPI_MCA_opal_cuda_support=true

echo "Start time: `date`"

mpirun dftb+ -i dftb_in.hsd > dftb_out.log

echo "End time: `date`"

Expected behaviour
parallel calculations should run well, and converge faster than the serial ones.

@bhourahine
Copy link
Member

There is a memory bug in the 23.1 release, which is possibly the cause for your problem. This was fixed by #1281 #1284 and #1294 and will be included in the next release. In the meanwhile, would it be possible for you to compile the latest unreleased version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants