Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DMRG-SCF error with pyscf+block2 #7

Closed
1iquidmoon opened this issue Mar 1, 2022 · 16 comments
Closed

DMRG-SCF error with pyscf+block2 #7

1iquidmoon opened this issue Mar 1, 2022 · 16 comments

Comments

@1iquidmoon
Copy link

Hi All,

I am willing to use block2 as a FCI solver in DMRG-SCF scheme with Pyscf 2.0.1. Both packages had been installed via pip and can run some examples.

To build the connection between Pyscf and block2, I manually added a script settings.py, as:

import os
from pyscf import lib

BLOCKEXE = '/home/cuiys/.conda/envs/cuiys/bin/block2main'
BLOCKEXE_COMPRESS_NEVPT = '/home/cuiys/.conda/envs/cuiys/bin/block2main'
#BLOCKSCRATCHDIR = os.path.join('./scratch', str(os.getpid()))
BLOCKSCRATCHDIR = os.path.join(lib.param.TMPDIR, str(os.getpid()))
#BLOCKRUNTIMEDIR = '.'
BLOCKRUNTIMEDIR = str(os.getpid())
MPIPREFIX = 'mpirun'  # change to srun for SLURM job system
BLOCKVERSION = '0.4.10'

So now I can correctly do 'from pyscf import dmrgscf'. I tried to run an example:

from pyscf import gto, scf, mcscf, dmrgscf, mrpt
dmrgscf.settings.MPIPREFIX = 'mpirun -n 3'

mol =gto.M(atom='C 0 0 0; C 0 0 1', basis='631g', verbose=5)
mf = scf.RHF(mol).run()
mc = dmrgscf.DMRGSCF(mf, 4, 4)
mc.kernel()

But it met with error termination. The error start from:

Intel MKL ERROR: Parameter 5 was incorrect on entry to DGEMM .Intel MKL ERROR: Parameter 5 was incorrect on entry to DGEMM .

If I remove dmrgscf.settings.MPIPREFIX = 'mpirun -n 3'. It also terminates with error.

If I use

mc.fcisolver = dmrgscf.DMRGCI(mol) 
dmrgscf.dryrun(mc)

then I can run with 'block2main dmrg.conf' normally. Can you give any clues?

Thanks in advance!
yunshu

@hczhai
Copy link
Contributor

hczhai commented Mar 1, 2022

Thanks for finding this issue.

Unfortunately I cannot reproduce this error so this may be environment dependent. Probably you can try whether adding mc.fcisolver.threads = 2 (or 4) before the last line can solve the problem.

However, even if you can make a single DMRG calculation run this way, the CASSCF iteration will not work because the output format of block2 and stackblock is different. You will get another error (cannot find dmrg.e) anyway.

To do DMRGSCF with block2 we have to use another script. Currently, you can use the following to do DMRGSCF with block2:

from pyscf import gto, scf, mcscf
from pyblock2 import dmrgscf

mol = gto.M(atom='C 0 0 0; C 0 0 1', basis='631g', verbose=5)
mf = scf.RHF(mol).run()
mc = mcscf.CASSCF(mf, 4, 4)
mc.fcisolver = dmrgscf.DMRGCI(mf)
mc.kernel()

Notice that the second line is different from the StackBlock script. For this script to work there is no need to change anything in settings.py (So the StackBlock DMRGSCF and block2 DMRGSCF can be supported simultaneously).

This pyblock2/dmrgscf.py interface is still experimental, so it does not support mpi or restarting yet. Also the efficiency is not optimized. We will update it soon. Nevertheless, it would be helpful for us to know what features are important for you.

@1iquidmoon
Copy link
Author

Many thanks to your kind reply.

I successfully run the DMRGSCF with your script. Is it true that currently I cannot perform a large DMRGSCF with Pyscf/block2?
Additionally, I am looking forward to the more efficient module on state-average calculation with different spin.

@hczhai
Copy link
Contributor

hczhai commented Mar 1, 2022

Thanks for the feedback. You can leave this issue open and I will let you know when a more efficient pyblock2/dmrgscf.py is available (will not take long).

At the meantime, I am also interested in understanding why there is a Intel MKL ERROR when you use your settings.py approach for DMRGSCF with block2.

If I remove dmrgscf.settings.MPIPREFIX = 'mpirun -n 3'. It also terminates with error.

Is this the same error as Intel MKL ERROR? Will adding mc.fcisolver.threads = 2 solve the MKL error?
If MKL error only comes with mpirun, then will changing mpirun -n 3 to mpirun --bind-to none -n 3 work? (This trick is mentioned in https://github.com/block-hczhai/block2-preview#mpi-version)

Also, with mpirun -n 3, it is important that you installed block2 using pip install block2-mpi. If you have already used pip install block2, you should first pip uninstall block2 and then pip install block2-mpi. The two versions cannot co-exist.

@1iquidmoon
Copy link
Author

You are exactly right. Indeed I made a mistake that I install both block2 and block2-mpi. It seems that I have always used block2. After adjustment I still meet with problems with my previous script. I don't know well on how mkl works. I can tell you some test results.

(I) for script:

from pyscf import gto, scf, mcscf, dmrgscf

mol =gto.M(atom='C 0 0 0; C 0 0 1', basis='631g', verbose=5)
mf = scf.RHF(mol).run()
mc = dmrgscf.DMRGSCF(mf, 4, 4)
mc.kernel()

I just install block2, pip list is as follows:

block2          0.4.10
cached-property 1.5.2
certifi         2021.10.8
cmake           3.17.0
h5py            3.6.0
intel-openmp    2022.0.2
mkl             2019.0
mkl-fft         1.3.0
mkl-include     2022.0.2
mkl-random      1.1.1
mkl-service     2.3.0
numpy           1.19.2
pip             21.2.2
psutil          5.9.0
pybind11        2.9.1
pyscf           2.0.1
pyscf-dmrgscf   0.1.0
scipy           1.6.2
setuptools      58.0.4
six             1.16.0
wheel           0.37.1

It calls Intel MKL ERROR .

(II) Then I pip uninstall block2, then pip install block2-mkl, pip list:

block2-mpi      0.4.10
cached-property 1.5.2
certifi         2021.10.8
cmake           3.17.0
h5py            3.6.0
intel-openmp    2022.0.2
mkl             2019.0
mkl-fft         1.3.0
mkl-include     2022.0.2
mkl-random      1.1.1
mkl-service     2.3.0
numpy           1.19.2
pip             21.2.2
psutil          5.9.0
pybind11        2.9.1
pyscf           2.0.1
pyscf-dmrgscf   0.1.0
scipy           1.6.2
setuptools      58.0.4
six             1.16.0
wheel           0.37.1

then run the scripts like

from pyscf import gto, scf, mcscf, dmrgscf
dmrgscf.settings.MPIPREFIX = 'mpirun -n 3'

mol =gto.M(atom='C 0 0 0; C 0 0 1', basis='631g', verbose=5)
mf = scf.RHF(mol).run()
mc = dmrgscf.DMRGSCF(mf, 4, 4)
mc.kernel()

In this case, block2 does not work. It terminates with

Traceback (most recent call last):
  File "/home/cuiys/.conda/envs/cuiys/bin/block2main", line 10, in <module>
    from block2 import SZ, SU2, SZK, SU2K, SGF
ImportError: /home/cuiys/.conda/envs/cuiys/lib/python3.7/site-packages/block2.cpython-37m-x86_64-linux-gnu.so: undefined symbol: ompi_mpi_uint64_t
Traceback (most recent call last):
  File "/home/cuiys/.conda/envs/cuiys/bin/block2main", line 10, in <module>
    from block2 import SZ, SU2, SZK, SU2K, SGF
ImportError: /home/cuiys/.conda/envs/cuiys/lib/python3.7/site-packages/block2.cpython-37m-x86_64-linux-gnu.so: undefined symbol: ompi_mpi_uint64_t
Traceback (most recent call last):
  File "/home/cuiys/.conda/envs/cuiys/bin/block2main", line 10, in <module>
    from block2 import SZ, SU2, SZK, SU2K, SGF
ImportError: /home/cuiys/.conda/envs/cuiys/lib/python3.7/site-packages/block2.cpython-37m-x86_64-linux-gnu.so: undefined symbol: ompi_mpi_uint64_t

At this time, mc.fcisolver.threads = 2, mpirun --bind-to none -n 3 do not change anything. It seems that there is something wrong with my block2-mpi.

@hczhai
Copy link
Contributor

hczhai commented Mar 1, 2022

Thanks for your detailed response. So now there are two separate problems (I) and (II).

For (I), this is using block2 without mpi. I noticed that in your settings.py, you have MPIPREFIX = 'mpirun' as the default, and in your script in (I), you did not change MPIPREFIX. So inside DMRGSCF, it will call mpirun block2main dmrg.conf. Note that mpirun without -n means mpirun -n 16 if you have 16 cores in the node. So to make sure that no mpi parallelization is used, you have to set MPIPREFIX = '' or MPIPREFIX = 'mpirun -n 1'. Then I expect that you will not get the MKL error. Please let me know whether this is true. If you have other errors we can discuss that later.

The reason for the MKL error is that, the non-mpi block2 does not have mpi communication, so when you do mpirun -n 3 block2main, it will just run the same calculation three times simultaneously. Normally this should still give you the right answer just more slowly. But block2 uses a scratch folder. When the three processors read and write in the same file, they will interfere each other and the loaded data (such as matrix dimension) can be wrong, so the MKL operation may fail.

For (II), this is likely due to the wrong openmpi version. The pip install block2-mpi requires the openmpi-4.0.x library (for example, openmpi-4.0.6). Another version will not work because the openmpi constantly changes its api. Unfortunately, openmpi version cannot be controlled by pip. You can use ldd /home/cuiys/.conda/envs/cuiys/lib/python3.7/site-packages/block2.cpython-37m-x86_64-linux-gnu.so to check whether libmpi.so points out to the anaconda libmpi.so. If so, then you can use conda list and it will print the version of openmpi. We need 4.0.x where x can be any number. If you compile block2-mpi manually (namely, not using pip), you can use any openmpi version. Alternatively, if you tell me what is your openmpi version I can also create a pip package for that specific version.

@1iquidmoon
Copy link
Author

Sorry for my late reply.
(I) After I set MPIPREFIX = '' it works pretty well for block2 and stop when it failed to find dmrg.e. Many thanks!
(II) I have tried extensively on using block2-main openmpi. Note that I create a new conda env test. I found that ldd gives that libmpi.so => /opt/apps/intel2018u4/compilers_and_libraries_2018.5.274/linux/mpi/intel64/lib/libmpi.so which I have no permission to change. Thus I mannually install an openmpi 4.0.6 and then LD_LIBRARY_PATH=/home/cuiys/software/openmpi/lib:$LD_LIBRARY_PATH and then ldd, it gives libmpi.so => /home/cuiys/software/openmpi/lib/libmpi.so (0x00002b67c9534000) seems to be correct. Right after this, I run the script with mpirun -n 3 or mpirun --bind-to none -n 3 or add mc.fcisolver.threads = 2 . Then the dmrg.out wortes

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGEMM .
Traceback (most recent call last):
  File "/home/cuiys/.conda/envs/test/bin/block2main", line 1859, in <module>

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGEMM .
    dmrg.solve(tto, forward, 0)
RuntimeError: DataFrame::load_data on '/home/cuiys/.conda/envs/test/lib/python3.9/site-packages/28798/F0.PART.DMRG.RIGHT.1' failed.
Traceback (most recent call last):
  File "/home/cuiys/.conda/envs/test/bin/block2main", line 1859, in <module>
    dmrg.solve(tto, forward, 0)
RuntimeError: DataFrame::load_data on '/home/cuiys/.conda/envs/test/lib/python3.9/site-packages/28798/F0.PART.DMRG.RIGHT.1' failed.
MPI FINALIZE: rank 0 of 3
MPI FINALIZE: rank 2 of 3

and did not terminate. I don't know whether it applied openmpi correctly. However, the size of tmp files seems to change which makes me believe that something is gong on. By the way, if I want to mannually install block2 with cmake 3.17.0. After

cmake .. -DUSE_MKL=ON -DBUILD_LIB=ON -DLARGE_BOND=ON -DMPI=ON
make

I get errors with lots of

/home/cuiys/software/block2-preview/src/pybind/dmrg/../../core/operator_functions.hpp:117:9 error: ‘mats’ implicitly determined as ‘firstprivate’ has reference type
 #pragma omp task

Could you give me some guide on it? Thank you for your patience.

@hczhai
Copy link
Contributor

hczhai commented Mar 2, 2022

Thanks for your test and try. Yes manually install an openmpi 4.0.6 is a good way. So now there are two new problems.

(1) Since now you have at least two versions of mpi installed in your system, you need to be careful to check whether mpirun matches the mpi library libmpi.so used in block2. You can use command which mpirun to see what the full path of your mpirun is. Ideally, it should be located in the same folder as your libmpi.so (indicated in ldd command). I expect that for your case, mpirun may still be the intel mpirun. If this is indeed the problem, you can solve it by setting a full path to the correct mpirun in the MPIPREFIX, or use PATH environment variable.

(2) The error is because your C++ compiler version (gcc/g++) is too old. We need roughly at least g++ 7.3.0. When you run cmake, it will print the compiler version. You can get a more recent g++ from conda (or you may use module load ... command to load something). I normally use g++ 9.x. You can use environment variables CC, CXX, MPICC, MPICXX to control the compiler used by cmake. You may also need to recompile the openmpi library using the same compiler. If you see some other error I will be happy to provide further information. But hopefully you will not need to go through this manual installation process, if you solved the problem (1).

@1iquidmoon
Copy link
Author

1iquidmoon commented Mar 2, 2022

So I just follow the simple version of your guide,

$ ldd /home/cuiys/.conda/envs/test/lib/python3.9/site-packages/block2.cpython-39-x86_64-linux-gnu.so
	libmpi.so => /home/cuiys/software/openmpi/lib/libmpi.so (0x00002ba36101b000)

and directly went

/home/cuiys/software/openmpi/bin/mpirun -n 3 block2main dmrg.conf

The dmrg.out shows

terminate called after throwing an instance of 'std::bad_alloc'
terminate called recursively
[mgr:66506] *** Process received signal ***
[mgr:66506] Signal: Aborted (6)
[mgr:66506] Signal code:  (-6)
terminate called recursively
terminate called recursively
terminate called recursively
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node mgr exited on signal 6 (Aborted).
--------------------------------------------------------------------------

It could run normally under block2main dmrg.conf
Could you give me some help?

@hczhai
Copy link
Contributor

hczhai commented Mar 2, 2022

(1) The 'std::bad_alloc' error may indicate that the memory is not enough. The default memory is 2GB. When you run block2main dmrg.conf, there is only one instance, so the total cost is 2 GB. For mpirun -n 3 block2main dmrg.conf the total cost will be 6 GB. You can use free -h to check how much memory is left in your node.

You can use

mc.fcisolver.block_extra_keyword = ["mem 1 g"]

To change the memory for each instance to 1 GB.

(2) If you think (1) is not the problem, you can attach your dmrg.conf, dmrg.out, and FCIDUMP (after the error termination) here. So that I can know a little bit more context of the problem.

(3) The block2main code is updated now to version 0.4.12. So now if you pip install block2-mpi==0.4.12 or pip install block2==0.4.12, you should be able to finish a DMRGSCF calculation using the block2main as the executable (set in settings.py). The dmrg.e not found error will disappear.

(4) For optimal performance for larger systems, you should (a) set fcisolver.threads to the number of CPU cores in one node, and (b) the number of mpirun -n processors is equal to the number of nodes used. --bind-to none option of mpirun is important as long as you use this hybrid parallelization. Other combinations of threads and mpi processors can also be okay, but may not be optimal for efficiency. When you only use one node, it is optimal to not use mpi (and use threads instead).

@1iquidmoon
Copy link
Author

Thanks a lot!
(1) It seems that there were not only memory issues. Attached please find these files (I manually added .txt in order to upload these)

dmrg.conf.txt
FCIDUMP.txt
dmrg.out.txt

(2)It's nice for me to know these updates. After solving the problem I would like to try some practical examples as soon as possible with block2.

@hczhai
Copy link
Contributor

hczhai commented Mar 3, 2022

Thanks for attaching the files. I can see the problem now. In your attached dmrg.out.txt, you see that things like INPUT START are printed three times, which makes the output hard to understand.
Ideally, they should only print once, no matter how many processors are used.

The reason for the error should be the problem in mpi4py library. You can run the command mpirun -n 3 python -c 'from mpi4py import MPI;print(MPI.COMM_WORLD.rank)' and I expect that it will produce an error. If this is the case, you need to first fix this problem, then block2main will work, because mpi4py is used in block2.

The solution to the mpi4py problem is explained in https://block2.readthedocs.io/en/latest/user/installation.html?highlight=mpi4py#installation-with-anaconda. The basic idea is that you may first uninstall mpi4py, then set suitable environment variables like MPICC, MPICXX, LD_LIBRARY_PATH, PATH so that your manually installed openmpi 4.0.6 is visible. Then you may reinstall mpi4py using python -m pip install --no-binary :all: mpi4py. This is to ensure that the mpi4py is built upon the same openmpi library used in block2. After that, rerun the command mpirun -n 3 python -c 'from mpi4py import MPI;print(MPI.COMM_WORLD.rank)' to see if the problem is resolved.

The source of all these problems is the fact that you have multiple mpi versions installed in your system. When this happens, we need to make sure that (1) mpirun is linked to the desired mpi library; (2) block2 is linked to the same library; (3) mpi4py is built based on the same library.

@1iquidmoon
Copy link
Author

Thanks for your great guide!

(1)Now I can run it. In fact I found I didn't have a mpi4py. Block2 works after I manually installed a mpi4py based on my openmpi 4.0.6.
(2)I also had a try with 0.4.12. Yes it can do DMRG-SCF asStackBlock, with exactly the same DMRGSCF energy and seems to be faster for this tiny system. By the way it seems that with settings.py approach one cannot do a DMRG-NEVPT2 calculation, is that true? Or I have made some mistakes.

Thanks again for guiding me so much!

@hczhai
Copy link
Contributor

hczhai commented Mar 3, 2022

It is great that this is eventually solved. Since the issue that we discussed here may also help other potential users, I updated the README file to include a link to this issue.

The NEVPT2 implemented in block2 used some very different algorithms (sc-nevpt2, ic-nevpt2, and uc-nevpt2), so it is hard to be used via settings.py without changing the pyscf code significantly. Currently, StackBlock is still the most efficient code for DMRG-NEVPT2 for practical problems.

@hczhai
Copy link
Contributor

hczhai commented Apr 7, 2022

@liquidmoons

Hi again, recently I did some tests on using block2 as the DMRGSCF solver. I found a problem related to the orbital reordering after restarting which can sometimes cause the energy increase during the DMRGSCF. It does not affect the standalone DMRG calculation. To fix this you can update to the most recent version (v0.4.14) of block2 if you use pip (pip install --upgrade block2) or git clone the newest source code if you compile manually.

I also added a page in block2 documentation for DMRGSCF (including an example of state average of mixed spin states): https://block2.readthedocs.io/en/latest/user/dmrg-scf.html

Finally, the README page of pyscf/dmrgscf is also updated so that more people can know that block2 is a possible DMRG solver for CASSCF in pyscf: https://github.com/pyscf/dmrgscf

@1iquidmoon
Copy link
Author

Thanks for your reminder. And here is an additional question: mcf.threads = 8 is specified in the example at https://block2.readthedocs.io/en/latest/user/dmrg-scf.html#state-average-with-different-spins. Whether two solvers run simultaneously?

@hczhai
Copy link
Contributor

hczhai commented Apr 11, 2022

The two solvers will not run simultaneously. Maybe I should change mcf.threads = 8 to mcf.threads = int(os.environ.get("OMP_NUM_THREADS", 4)) to avoid confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants