New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-linear (quadratic?) scaling of SOAP calculation #31
Comments
Hi @anjohan! Wow, 250 k atoms :) Are you actually working with systems of this magnitude? The linear scaling in classical MD is largely possible if there is no need to calculate neighbour lists. This is typically the case with fixed bonds. As soon as you have to start figuring out which atoms are within the radial cutoff, there is a need to calculate the pairwise distances between atoms. SOAP is quite interesting in this case. In theory, there is no need to ever calculate pairwise distances. One could just form a gaussian atomic density based on all atoms. This would make the algorithm scale linearly. In practise however, this linear calculation will become quite expensive in terms of time and memory if the atomic density consists of thousands of atoms. So instead, in our current implementation, only atoms within the cutoff are included in the atomic density when calculating the expansion on individual sites. This requires that we calculate the pairwise distances once. With the current implementation this results in the O(n^2) scaling you here demonstrate. The current distance calculation is quite naive and I can improve it to scale as O(n log n). This faster distance calculation is actually already implemented for many other descriptors in the package. It was not implemented for SOAP as with typical system sizes this distance calculation is negligible, and only starts to dominate the SOAP calculation at very large system sizes. So in summary, I will implement the faster pairwise distance calculation, which will make it scale O(n log n). I will also see if the fully linear calculation is feasible, but I fear it has such a big constant prefactor that it will not be usable in practice. |
With what SOAP settings do you get 1.5 h/configuration? |
Thanks for the quick response!
Yes (unfortunately). I'm using the SOAP vectors to visualise phase transitions for fairly large systems. This may not have been your intended purpose, but it's working superbly!
Hm. Thanks to dscribe, I haven't had to dive too deeply into the descriptor calculations, but I was hoping that it should be linearly scaling due to the cutoff. From what I understand, SOAP vectors rely on summing Gaussians centred at each neighbour and then projecting this onto some basis. This is conceptually similar to computing short-range forces in MD, so I was hoping the spatial binning trick would work, i.e.
The time thus ends up being proportional to (number of atoms) x (number of neighbours), where the number of neighbours is approximately rho x (27rcut^3), rho being the density. While the number of neighbours is a pretty large prefactor, it's still overall linear, and with N=250k, the number of neighbours per atom is very small compared to the total number of atoms. Would this not work for SOAP vectors? I did look around inside the libsoap directory, but the code is (understandably) quite involved. From what I gather, the If you create a spatial binning of the atoms, the Sorry if my explanation is rambly, here's some pseudo-code: def soap(...):
atomlist = way of storing atoms
L = system size (assuming cubic)
rcut = cutoff
Nbins = ceil(L/rcut)
binsize = L/Nbins
all_atoms = atomlist with all the atoms
soap_positions = where SOAPs should be calculated
atomlists = atomlist[Nbins][Nbins][Nbins] # array of empty atom lists
# do spatial binning, fill atomlists
for atom in all_atoms:
i = atom.x/binsize
j = atom.y/binsize
k = atom.z/binsize
atomlists[i][j][k].append(atom)
for position in soap_positions:
neighbours = getFilteredPos(position, atomlists)
calculate SOAPs based on neighbours
def getFilteredPos(position, atomlists):
# Determine which bin
this_i = position.x/binsize
this_j = position.y/binsize
this_k = position.z/binsize
neighbours = []
# Loop over neighbouring bins
for i in this_i+[-1,0,1]:
for j in this_j+[-1,0,1]:
for k in this_j+[-1,0,1]:
for atom in atomlists[i][j][k]:
if norm(atom.position - position) < rcut:
neighbours.append(atom)
return neighbours
rcut = 6.0, nmax=lmax=3. I'm essentially running the system in my example above with 50x50x50 unit cells. With 18x18x18 unit cells (11664 atoms), each frame took 10-20 seconds. These runs are on fairly standard intel nodes on the cluster. Since I have to calculate SOAP vectors for 100-1000 of these frames, I've parallelised it with mpi4py. |
Thanks for the detailed answer! You're application sounds very interesting and I would be very interested to see what comes out of such visualization. I think you are right and the binning might be the best solution since there is only one global cutoff. Should have thought about that earlier, thanks for pointing it out! I will try my hand at implementing and testing it. Unfortunately I cannot promise any schedule, as I don't right now have too much time to spend on this. |
No worries, this isn't urgent. The SOAP vector calculation still takes much less time than the MD simulation, since I'm only calculating descriptors for a small subset of the frames and using mpi4py. I should also be able to add a naive (but linearly scaling) binning myself, now that I understand more of libsoap. I may have some time later this week. |
I've started working on an implementation. |
That's great! External contributors are very welcome. Will you be doing this on the python side or in C/C++? For now this doesn't really matter, but eventually, I would want this functionality in the C/C++ side and available to all the other descriptors as well. |
In C. As a result, I'm currently enjoying some segfaults. |
Have fun :> |
Thanks @anjohan for contributing the spatial binning algorithm! It is now fully tested and integrated into SOAP in version 0.3.2. |
Hi,
With the SOAP vectors working perfectly, I turned to a "real" system with more atoms. With 250 000 atoms, each frame takes 1.5 hours.
From my experimenting (see below), it seems that the SOAP vector calculation scales approximately linearly as a function of the square of the number of atoms, i.e. quadratically with the number of atoms. This goes for both 0.2.8 and the newest GitHub version.
Since there is a cut-off for interatomic distances, I would have expected the SOAP calculation to scale linearly with the number of atoms like classical MD force fields. Is this not possible?
The text was updated successfully, but these errors were encountered: