Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOAP descriptors of different species too similar? #58

Closed
ryokbys opened this issue Feb 12, 2021 · 5 comments
Closed

SOAP descriptors of different species too similar? #58

ryokbys opened this issue Feb 12, 2021 · 5 comments

Comments

@ryokbys
Copy link

ryokbys commented Feb 12, 2021

Hi,
It's a great project.
I tried to create SOAP descriptors of a crystalline system including a few chemical species. But the obtained SOAP descriptors of other species seem too close. Attached is an example of Mg2SiO4 and the plot at the end of the notebook shows very close SOAP values for Mg, Si, and O species. I am not very sure whether it is wrong, but if I make SOAP descriptors with a different program values for the species are much different. Could you tell me if something wrong with my way of creating SOAP?

Versions

  • dscribe 0.4.0
  • ase 3.20.1

structure
スクリーンショット 2021-02-12 11 04 58

$ cat POSCAR
Mg8 Si4 O16
1.0
4.802840 0.000000 0.000000
0.000000 6.047893 0.000000
0.000000 0.000000 10.323355
Mg Si O
8 4 16
direct
0.000000 0.000000 0.000000 Mg
0.500000 0.000000 0.500000 Mg
0.000000 0.500000 0.000000 Mg
0.500000 0.500000 0.500000 Mg
0.008375 0.750000 0.277264 Mg
0.508375 0.250000 0.222736 Mg
0.991625 0.250000 0.722736 Mg
0.491625 0.750000 0.777264 Mg
0.925974 0.750000 0.593638 Si
0.425974 0.250000 0.906362 Si
0.074026 0.250000 0.406362 Si
0.574026 0.750000 0.093638 Si
0.276946 0.466986 0.837165 O
0.776946 0.533014 0.662835 O
0.723054 0.966986 0.162835 O
0.223054 0.033014 0.337165 O
0.723054 0.533014 0.162835 O
0.223054 0.466986 0.337165 O
0.276946 0.033014 0.837165 O
0.776946 0.966986 0.662835 O
0.777851 0.750000 0.446782 O
0.277851 0.250000 0.053218 O
0.222149 0.250000 0.553218 O
0.722149 0.750000 0.946782 O
0.265961 0.750000 0.591677 O
0.765961 0.250000 0.908323 O
0.734039 0.250000 0.408323 O
0.234039 0.750000 0.091677 O
@lauri-codes
Copy link
Contributor

Hi @ryokbys!

I don't see anything fundamentally wrong with your current setup. In the context of the original definition for SOAP (https://doi.org/10.1103/PhysRevB.87.184115), there are a few parameters you can tune if you are interested in tuning SOAP to be more sensitive to the sites:

  • Reduce rcut
  • Try crossover=False

Also, if you are using these descriptors in a machine learning environment, the learning algorithm hyperparameters should adjust quite well to detecting even smaller changes.

There are some ideas for improving the locality of SOAP that go beyond the original definition, see issue #20. Out of curiosity, what other code are you using, and which parameters are you using?

@ryokbys
Copy link
Author

ryokbys commented Feb 12, 2021

Thanks @lauri-codes for your quick response.

The reason why asked this is because I got the following PCA map of SOAP features obtained from polycrystalline 4-component system using DScribe, which was much different from I expected. I expected that points of each species make separate clusters. But this could be true...
PCA_SOAP_poly

Actually the PCA map obtained from crystalline of the same 4-component system, I got more reasonable-looking(?) one like the following.
PCA_SOAP_crystal

The one I used to compare SOAP descriptor is a fortran wrapper routine that calls GAP code in QUIP. But unfortunately it is not scalable to large system, that's why I am trying DScribe. The SOAP graph obtained by it with almost the same configuration looks more different between species, but it could also be the magic of ordering of SOAP index or something.

Anyway, thanks for your advices.

@lauri-codes
Copy link
Contributor

Hi @ryokbys!

Coming back to this issue in preparation for version 1.1.0: we will add a new weighting option for SOAP that will greatly increase the local resolution. Here is an example script of using it (the structure is here: forsterite.zip):

import numpy as np
import matplotlib.pyplot as mpl
from ase.io import read
from dscribe.descriptors import SOAP

atoms = read("./forsterite.poscar")
atoms.wrap()

soap = SOAP(
    species=["Mg", "Si", "O"],
    periodic=True,
    weighting={"function": "poly", "r0": 6, "c": 1, "d": 1, "m": 4},
    nmax=4,
    lmax=3,
    crossover=True,
)
feat = soap.create(atoms)
fix, ax = mpl.subplots(figsize=(10, 5))
for index in [0, 8, 24]:
    i_feat = feat[index]
    x = np.arange(0, i_feat.shape[0])
    mpl.plot(x, i_feat, label=atoms.get_chemical_symbols()[index])
mpl.legend()
mpl.tight_layout()
mpl.show()

This produces the following plot for your structure:
forsterite

As you can see, by introducing a radial weighting that monotonuously decreases towards zero the further you go from a center (this is controlled by the new weighting-dictionary), you can remedy the problems discussed in issue #20. There are several parameters that you can use to tune the weighting to fit your needs. We are preparing version 1.1.0 for release, but if you want, you can already play with it by cloning branch 1.1.0 from git and installing it with

cd dscribe
pip installl .

@lauri-codes
Copy link
Contributor

Closing this for now, please see the previous comment.

@futuring-wcy

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants