Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring LIO #44

Open
manuelF opened this issue Dec 22, 2014 · 17 comments
Open

Refactoring LIO #44

manuelF opened this issue Dec 22, 2014 · 17 comments

Comments

@manuelF
Copy link
Collaborator

manuelF commented Dec 22, 2014

As everybody is making tons of progress towards end of the year, and now that multiple branches are in place, it is time to step back and try to make changes that would simplify the code complexity a lot.

We've discussed a bit with @nanolebrero about separating the code in different sections that perform the various calculations. As of now, lioamber takes everything but the XC code and g2g is only about XC.
A major issue would be how to deal with porting more and more things to CUDA (and Xeon Phi later on). This is very important, as @chopkins906 is making great progress on that front, as is @segmentat10nfault with CUBLAS port. The idea is to minimize code duplication and #ifdefs everywhere, that makes code hard to understand and hard to test.

A proposal to improve the current situation would be to split up the current folders into the different calculations when they become large enough (to be determined by the current maintainer).

This would be, as of now (considering @chopkins906 branch):

-lioamber/

-td/

-scf/
-- other/
-- xc/
----cpu/
----cuda/

-qmmm/
---cpu/
---cuda/

-common/
---cpu/ ?
---cuda/ ?

-generated_libs/

The idea would be mostly to break up lioamber/ into the different calculations that make up SCF, and to leave in lioamber/ the few things needed to actually connect to AMBER.

Also, since most of what was needed to do QMMM easily was on the g2g folder, that code would be moved to a common folder (things like Matrix, Timers, vec_types, memory pools). That would greatly simplify both g2g (now xc) and QMMM, making it more concise and unified. I think that most of the work @ramirezfranciscof is doing on Fortran could also go into this folder, it might benefit the C code too (not sure on how that might be done).

Of course this will be a mess of Makefiles at first, but I am confident this will be positive in the long run, as to reduce the amount of code duplication and complexity would lower the barrier of entry for everybody in the long run.

As I am not aware of everything done by LIO, I am requesting input from all of you. The proposed schema is just a first thought and in no way a fixed decision. If anyone has any ideas on this, let's share them so we can discuss them and make a better product.

@ramirezfranciscof
Copy link
Collaborator

The idea would be that each "main" folder compiles into its own library?

I agree with separating the "interface" with amber from the calculation procedures of lio. I think it will make it easier if we want to include lio as a lib for other packages.

Regarding the scheme proposed, if I understand it correctly, the idea would be to store the code that is used in all the separate type of runs (and not specific of one) inside of common so that each other folder has independent code (and all of them depend on common). I think this would result in a lot of files that are not very related going into common (memory pools and timer along with all the int subs) and just a couple of files in each of the other folders. Moreover, if I am not mistaken, qmmm is not a separate type of run like SCF and TD but is something that is used by all of them.

Personally, I'm more inclined to a scheme more similar to the one I am trying to use now, where I leave the main files loose in the main folder and then organize procedures that are related inside subfolders which compile into single modules. Something like this:

LIO
--Loose Files: Makefile, README
--lioamber/interfaces
--tests
--postprocessing
--src
----Loose Files: dip.f, dipmem.f, ehrenfest.f, init.f, finalize.f, liomain.f, magnus.f, mulliken.f, SCF.f, TD.f (among others, like a .mk)
----mathsubs
----integrals
----qmmm*
etc.

  • This is the difficult part because I don't have that great an idea of what is included in the g2g lib; if possible, I would prefer to create a module with "general" code to put stuff like the timers (and memory pools? I really don't know what are those), and the rest to be distributed in the corresponding already existing modules: for example, matrix multiplications in gpu would be inside of mathsubs, and since most procedures there are inside interfaces to "overload" them, I think we could just use ifdef like this:

interface basechange
#ifdef GPU
module procedure basechange_d
module procedure basechange_c
module procedure basechange_z
#else
module procedure basechange_d_gpu
module procedure basechange_c_gpu
module procedure basechange_z_gpu
#endif
end interface

If this is possible, I think that modules will be no problem for the gpu/cpu dualism; however it might not be as easy to do this neatly for the files that are loose in src, but I think this problem would also be present in the other scheme.

Anyway; is there a reason why this other scheme may not be advisable? Or why it might be worst than the one first proposed?

@chopkins906
Copy link
Collaborator

As far as the QM/MM stuff goes, however the code ends up structured, it should probably be a “sibling” of the XC routines. As @ramirezfranciscof says, qmmm is used in the SCF routine, as well as for the forces needed by Amber; it is doing integral calculations kind of like the XC code (different integrals but the form of the output is the same) - the “intsol” and “intsolG” routines in the Fortran code.

The main difference from XC in terms of how it’s called is that the energy/rmm update/Fock update/whatever you want to call it part of qmmm is only called once before the start of the SCF iterations (as opposed to the XC which needs to be run each iteration). The forces/gradients part of qmmm is called in the same way as for XC.

One small technical issue I ran into when trying to split off the qmmm GPU code into separate files from the XC stuff was the use of constant GPU memory symbols. The qmmm code needs some of the same variables declared in g2g/cuda/gpu_variables.h and then initialized in g2g/cuda/iteration.cu. However, my understanding is that these GPU variables only have file scope, so that when I tried to put the qmmm routine in a separate file and access those variables, it was as if I was accessing a new, uninitialized piece of memory.

It’s not really a huge deal; we’re using very little constant memory, so it would be easy to work around, but is there a canonical CUDA way to share constant memory between files/compilation units?

@ramirezfranciscof
Copy link
Collaborator

I have never touched the GPU code yet, and despite I went though like half a course of cuda programming, I'm pretty ignorant of the deep aspects of CUDA so I don't fully understand neither the question nor the answer and I'm not even sure this will be useful to you; but just in case it helps: http://stackoverflow.com/questions/17335052/cuda-and-file-scope.

@chopkins906
Copy link
Collaborator

OK, yeah I guess the "separate compilation mode" will be useful if we want to have separate GPU modules that are compiled separately and refer to common GPU memory locations (atom positions, etc)...seems it mostly just needs a change on the Makefile side.

@ramirezfranciscof
Copy link
Collaborator

So....what happened with this (this = the reorganization scheme)?

@chopkins906
Copy link
Collaborator

Small update here: I pushed an update to the rc branch that splits the QM/MM / Coulomb GPU code off into a separate file / compilation unit (from the XC code) using separate compilation mode as in the article that @ramirezfranciscof mentioned before. These changes to the code/Makefiles provide one model for keeping separately compiled GPU code (that share common device memory). This should make it a little easier if we want to move the QM/MM and Coulomb stuff to a separate directory (though I think this part of the code should be kept close to the XC code, maybe in one "integrals" parent directory).

@fedepedron fedepedron added this to the LIO release 2.0 milestone Dec 22, 2016
@ramirezfranciscof
Copy link
Collaborator

New Directory Schema:

/src (source code, makefiles and compiled objects)
/lib (compiled libraries)
/test (system tests)
/dat (basis sets...anything else?)
/bin (compiled executables)
/doc (lio user manual)

Internal structure of src is still to be determined.

@ramirezfranciscof ramirezfranciscof self-assigned this Dec 23, 2016
@fedepedron
Copy link
Collaborator

While updating the readme file I came across a small issue to take into account: in AMBER, LIO library locations are hardcoded in LIOHOME/g2g LIOHOME/lioamber. In GROMACS this is not an issue since the library path is introduced as a compilation flag; maybe we should do the same with AMBER.

@ramirezfranciscof
Copy link
Collaborator

ramirezfranciscof commented Jan 12, 2017

Ok, what we can do for the time being is have a "make amber_adapt" that generates those two folders and just puts a symbolic link to the real location of the libraries. If the libs are recompiled, the links should not need to be updated, am I right? An alternative solution is that every time the libs are compiled, they are automatically copied to those dirs...


Proposed method of organizing fortran source files: all should be inside a module (perhaps an exception could/have to be the ones called by external programs), and modules are organized as follows...

modname.f90 (file containing the module)
modname.mk (file containing compiling info for the module)
modname (folder containing files to be included in the f90, if needed for organizational purposes)
[modname.mod] (generated by compilation)
[modname.o] (generated by compilation)

modname.mk should contain the dependency on included files and the list of external modules that depend on modname. It is still not clear to me if it is better to have the corresponding flags that affect this module here or if there should be a separate make_flagname.mk for each flag or a single make_flags.mk for all flags (I would tend to think this last one may be the best option). Comments on these ideas are welcomed.

@fedepedron
Copy link
Collaborator

Well, step by step:

  1. Agreed on the "make amber_adapt" or whatsoever, I believe a symlink should be way easier to handle than copying the libraries over and over again.

  2. I have yet to understand the difference between the modname.f90 and the modname folder; I mean, is everything together in a single file or separated in a thousand? Or modname.f90 just contains common routine/variable names/parameters?

  3. I believe a single make_flags for all flags is a better option than keeping separate make_flagnames.

@ramirezfranciscof
Copy link
Collaborator

About (2): A module "populations.f90" may contain 2 or 3 subroutines to calculate electronic populations and may all fit comfortably in that file. But a module "math.f90" that contains matrix multiplications, base changes, commutators, etc. (where you may have like 4 - 8 versions of each) would perhaps be too lengthy if it was all in one file. I'm leaving the option of having a folder in which to dump "matmult.f90", "basechange.f90", "conmut.f90" and then have "#include" statements in "math.f90". To see how this works you can check the actual mathsubs folder in lioamber (the idea would be to take mathsubs.f90 and mathsubs.mk and put them outside the folder).

One could ask in such a case if matmult.f90, basechange.f90 and conmut.f90 shouldn't be separated modules. I'm still not a 100% sure on how to best organize/modularize code; personally it would bug me to have such inter-related procedures not be associated/grouped in any way. Ideas on how to deal with this would indeed be welcome.

@fedepedron
Copy link
Collaborator

Thanks for the clarification, I too think it would be a good idea to have the module's folder (in the case of math, containing matmult, basechange and conmut) and then module.f90 and module.mk outside as you said.

@fedepedron
Copy link
Collaborator

fedepedron commented Oct 13, 2017

In our last meeting, we decided on:

  • /bin -> contains lio executable
  • /lib -> libraries for lio and lioamber/gromacs/hybrid
  • /src -> source code
  • /docs -> manual
  • /tests -> tests
    Further development will be discussed.

@nanolebrero
Copy link
Collaborator

nanolebrero commented Oct 13, 2017 via email

@fedepedron
Copy link
Collaborator

Aknowledged

@ramirezfranciscof
Copy link
Collaborator

UPDATED STRUCTURE

-/bin -> contains lio executable

-/docs -> manual
-/docs/src
-/docs/manual.pdf

-/lib -> libraries for lio and lioamber/gromacs/hybrid

-/readme.md

-/src -> source code; we need to discuss about this internal structure
-/src/liosolo
-/src/tdanalize
-/src/lioamber -> rename, at the least
-/src/g2g

-/tests -> tests, organized by executable, with a separated folder for fast basic tests (option 1)
-/tests/liosolo_tests
-/tests/amber_tests
-/tests/gromacs_tests
-/tests/basic_tests

-/tests -> tests, organized by feature, with a separated folder for fast basic tests (option 2)
-/tests/featureX_tests -> one folder for each of SCF, TD, transport, DFTB, ehrenfest, pseudopots, etc.
-/tests/external_tests -> the same for amber, gromacs, etc. tests
-/tests/basic_tests

@fedepedron
Copy link
Collaborator

Seems we are sticking with the structure @ramirezfranciscof mentioned; I would further separate aint_gpu from g2g; but we could also go with this structure:

-/src/cpp_cuda/g2g
-/src/cpp_cuda/aint
-/src/fortran/
-/src/lioexe/

Fortran would be lioamber and lioexe is liosolo. The names are of course placeholders.

@ramirezfranciscof ramirezfranciscof removed their assignment Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants