Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine a species linearity from xyz #1601

Merged
merged 5 commits into from May 17, 2019
Merged

Determine a species linearity from xyz #1601

merged 5 commits into from May 17, 2019

Conversation

alongd
Copy link
Member

@alongd alongd commented May 14, 2019

Arkane uses the species linearity when projecting out rotor frequencies.
This PR automates the determination of the linearity attribute, so now it is optional in input files.
If the problem can be reduced in dimensionality, we first identify that. We then check that all atoms from the 3rd one on form a straight line (either in 2D space or in 3D space) within some precision with the first two atoms.
Examples and documentations were updated. Tests were added.

@alongd alongd added the Arkane label May 14, 2019
@alongd alongd requested a review from cgrambow May 14, 2019 02:50
@alongd alongd self-assigned this May 14, 2019
@codecov
Copy link

codecov bot commented May 14, 2019

Codecov Report

Merging #1601 into master will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1601      +/-   ##
==========================================
+ Coverage   41.65%   41.67%   +0.01%     
==========================================
  Files         176      176              
  Lines       29102    29102              
  Branches     5988     5988              
==========================================
+ Hits        12122    12127       +5     
+ Misses      16142    16138       -4     
+ Partials      838      837       -1
Impacted Files Coverage Δ
rmgpy/data/kinetics/family.py 52.97% <0%> (+0.23%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b944ddb...a66ffee. Read the comment docs.

def nearly_equal(a, b, sig_fig=5):
"""
A helper function to determine whether two floats are nearly equal.
Can be replaced by math.isclose() in Python3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also consider using numpy.isclose.


x, y, z = [], [], []

for coordinate in coordinates:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace this using x, y, z = zip(*coordinates).

y.append(coordinate[1])
z.append(coordinate[2])

# first check whether the problem can be reduced into one or two dimensions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of doing all of this, it might be simpler to just check the angles between all atom triples and make sure they're all close to 180 degrees. To do so, you could calculate the distance vectors between all atoms and then check that the angle between any two of them is close to 180 degrees (or -180 degrees). A very efficient implementation of this should be possible with Numpy. Not sure what a suitable tolerance would be. I don't know if this is a better idea; let me know what you think.

For example, a tensor containing all distance vectors can be calculated with Numpy broadcasting in a single line:

d = -np.array([c[..., np.newaxis] - c[np.newaxis, ...] for c in coordinates.T])

which then gives the vector between atom 0 and atom 1 as d[:, 0, 1].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both methods are equivalent in essence, but your suggestion is significantly more friendly to implement / maintain / debug. I'd happily adopt it.

if linear is None:
linear = is_linear(coordinates, label)
not_txt = '' if linear else ' not'
logging.info('determined species {0}{1} to be linear'.format(label, not_txt))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only print this if you detect that the species is linear?

[-1.02600933, 2.12845307, 0.00000000],
[-0.77966935, 0.95278385, 0.00000000],
[-1.23666197, 3.17751246, 0.00000000],
[-0.56023545, -0.09447399, 0.00000000]]) # C2H2, just 0.5 degree off from linearity, so NOT linear

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think would be a good tolerance for determining linearity? Are there cases where we think that 0.5 degrees should be linear?

@alongd
Copy link
Member Author

alongd commented May 16, 2019

The tolerance is indeed a mystery.
I looked up CCCBDB for some linear molecules at a lousy (B3LYP/6-31G*) and a fancy method (CCSD(T)=FULL/aug-cc-pVQZ), but for CO2 and C2H2 both methods had exactly 180.0 (or 0.0) degrees between all triples. Thinking that they probably used Z-Mats, I ran CO2, C2H2, HCN, NCN(S), NCN(T), C#CC#C, and C#CC#CC#CC#C at wb97xd/6-311+G(d,p) using ARC (no ZMats), but got the same thing: all angles were dead on 180.0 (or 0.0) degrees (I used Gaussian).

So I plan on using 0.2 degrees for the tolerance, just based on my hunch, but I'm open for sugestions.

I added fixup commits and rebased, let me know when to squash and whether you have any other comments

Thanks for helping me improve the algorithm!

if i > 1:
u1 = d[:, 0, 1] / numpy.linalg.norm(d[:, 0, 1]) # unit vector between atoms 0 and 1
u2 = d[:, 1, i] / numpy.linalg.norm(d[:, 1, i]) # unit vector between atoms 1 and i
a = math.degrees(numpy.arccos(numpy.clip(numpy.dot(u1, u2), -1.0, 1.0))) # angle between atoms 0, 1, i

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using numpy.clip to avoid invalid arguments for numpy.arccos if there are floating point errors in the calculation of the dot product?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought to use cautious although my examples ran fine without it

return False
# A tensor containing all distance vectors in the molecule
d = -numpy.array([c[:, numpy.newaxis] - c[numpy.newaxis, :] for c in coordinates.T])
for i, c in enumerate(coordinates):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually need to iterate over coordinates, correct? I'd just do
for i in range(2, len(coordinates))
and remove the if-statement on the next line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely!

@cgrambow
Copy link

That actually really interesting and good to know. 0.2 is probably ok for now, maybe even smaller based on your observations? We can adjust as we learn more. I requested one small change. After that you can squash.

@alongd
Copy link
Member Author

alongd commented May 17, 2019

Thanks for all the comments, the PR is in a much better shape now. Let me know if there are any other issues, otherwise this should be good to go

@alongd
Copy link
Member Author

alongd commented May 17, 2019

@cgrambow, I changed the tolerance to 0.1 degrees and added a descriptive comment (squashed)

Copy link

@cgrambow cgrambow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one more if-statement that you can remove.

# A tensor containing all distance vectors in the molecule
d = -numpy.array([c[:, numpy.newaxis] - c[numpy.newaxis, :] for c in coordinates.T])
for i in range(2, len(coordinates)):
if i > 1:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this if-statement anymore since i now starts at 2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed

@cgrambow cgrambow merged commit 19837cd into master May 17, 2019
@cgrambow cgrambow deleted the linearity branch May 17, 2019 15:29
@mliu49 mliu49 mentioned this pull request May 23, 2019
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants