## This is the Realignment Script for Alkenes

Alkenes are realigned by the following rules (where the previous rule is always applied):

1. BFS volume

2. If there are multiple substituents that share the BFS volume: sort by the substitution that has the smallest BFS volume cis
 - Inspired by Sharpless quadrant model: Steric bulk Cis to quadrant is most important

3. Minimum 99th Percentile ESP value from there grid

4. If there are multiple substituents that share the same ESP99: sort by the one that has the smallest ESP99 trans
   - Inspired by Norrby quadrant model: Secondary pi interaction trans to substituent

5. Maximum Polarizability

6. If there are multiple substituents that share the same Polarizability: sort by the one that has the largest polarazability trans
    - Inspired by Norrby quadrant model: Secondary pi interaction trans to substituent

If all fall through, the alignment defaults to the original scheme with maximum volume. This assumes that the quadrants themselves
are very comparable with little difference between them and arbitrary alignment will not have much impact on them.

In [4]:
import molli as ml
import numpy as np
from tqdm import tqdm
from molli.math import rotation_matrix_from_vectors, rotation_matrix_from_axis

class AlkeneRel:
    def __init__(self, ml_mol, a1, a2, a3, a4):
        self.ml_mol = ml_mol
        self.a1 = a1
        self.a2 = a2
        self.a3 = a3
        self.a4 = a4
        self.atoms = [a1, a2, a3, a4]
    
    def trans_atom(self, atom):
        '''Returns the trans atom

        Parameters
        ----------
        atom : ml.Atom
            Atom of interest

        Returns
        -------
        ml.Atom
            Returns atom that is trans to the atom selected
        '''
        
        match atom:
            case self.a1:
                return self.a3
            case self.a2: 
                return self.a4
            case self.a3:
                return self.a1
            case self.a4:
                return self.a2
            case _:
                raise ValueError("Atom not in the defined four atoms")

    def cis_atom(self, atom):
        '''Returns the cis atom

        Parameters
        ----------
        atom : ml.Atom
            Atom of interest

        Returns
        -------
        ml.Atom
            Returns atom that is trans to the atom selected
        '''
        match atom:
            case self.a1:
                return self.a2
            case self.a2: 
                return self.a1
            case self.a3:
                return self.a4
            case self.a4:
                return self.a3
            case _:
                raise ValueError("Atom not in the defined four atoms")
            
    def gem_atom(self, atom):
        '''Returns the geminal atom

        Parameters
        ----------
        atom : ml.Atom
            Atom of interest

        Returns
        -------
        ml.Atom
            Returns atom that is trans to the atom selected
        '''
        match atom:
            case self.a1:
                return self.a4
            case self.a2: 
                return self.a3
            case self.a3:
                return self.a2
            case self.a4:
                return self.a1
            case _:
                raise ValueError("Atom not in the defined four atoms")

    def max_vol(self, atom):
        return atom.attrib['Sterimol']['vol']
    def bfs_vol(self, atom):
        return atom.attrib['Sterimol']['bfs2']
    def esp99(self, atom):
        return atom.attrib['99ESPMax']
    def pol(self, atom):
        return float(atom.attrib['pol'])

def set_origin(ml_mol:ml.Molecule,i: int):
    ml_mol.translate(-1*ml_mol.coords[i])

def rot_rz(ml_mol: ml.Molecule, c0:ml.Atom, c1: ml.Atom):
    v1 = ml_mol.vector(c0, c1)
    t_matrix = rotation_matrix_from_vectors(v1, np.array([1,0,0]))
    ml_mol.transform(t_matrix)

def rot_xy(ml_mol: ml.Molecule, q1a:ml.Atom, c0:ml.Atom, c1:ml.Atom):
    '''
    Q1
      \\ 
        Q1C = Q2C
    
    This should be positive with respect to right hand rule
    '''
    # v1 = ml_mol.vector(q1a, c0)
    v1 = ml_mol.vector(c0, q1a)
    v2 = ml_mol.vector(c0, c1)
    # print([ml_mol.get_atom_index(a) for a in [q1a,c0,c1]])
    c = np.cross(v1,v2)
    t_matrix = rotation_matrix_from_vectors(c, np.array([0,0,-1]))
    ml_mol.transform(t_matrix)

def check_q_align(ml_mol: ml.Molecule,q1a:ml.Atom,q2a:ml.Atom,q3a:ml.Atom):
    '''Asserts that the Q1Q2 vector and Q2Q3 vector are negative with respect to the z axis. This confirms that they are ordered correctly.

    Parameters
    ----------
    ml_mol : ml.Molecule
    q1a : ml.Atom
        Q1 Atom
    q2a : ml.Atom
        Q2 Atom
    q3a : ml.Atom
        Q3 Atom
    '''
    # print([ml_mol.get_atom_index(a) for a in [q1a, q2a, q3a, q4a]])
    q1q2v = ml_mol.vector(q1a, q2a)
    q1q3v = ml_mol.vector(q1a, q3a)

    if np.sign(np.dot(q1q2v, q1q3v)) != 1:
        t_matrix = rotation_matrix_from_axis([1,0,0], np.radians(180))
        ml_mol.transform(t_matrix)
            
        q1q2v = ml_mol.vector(q1a, q2a)
        q1q3v = ml_mol.vector(q2a, q3a)

def fix_alkene_types(ml_mol:ml.Molecule, q1a:ml.Atom,q2a:ml.Atom,q3a:ml.Atom,q4a:ml.Atom):
    '''
    This function serves to correct the alkene types based on a new alignment scheme.
    '''
    # {h_count}\n{m.name}\n{m.attrib["_Canonical_SMILES_H"]}
    q_atoms = [q1a,q2a,q3a,q4a]

    #This creates a list of the location of hydrogens (i.e. True if hydrogen, else False)
    h_order = [x.element == ml.Element.H for x in q_atoms]
    h_count = h_order.count(True)

    match h_count:
        #3 hydrogens (Monosubstituted)
        case 3:
            ml_mol.attrib['_Alkene_Type'] = 'Mono'
        #2 hydrogens (Disubstituted)
        case 2:
            match h_order:
                #Hydrogens locate in Q3 and Q4
                case [0,0,1,1]:
                    ml_mol.attrib['_Alkene_Type'] = 'Cis'
                #Hydrogens locate in Q2 and Q4
                case [0,1,0,1]:
                    ml_mol.attrib['_Alkene_Type'] = 'Trans'
                #Hydrogens locate in Q2 and Q3
                case [0,1,1,0]:
                    ml_mol.attrib['_Alkene_Type'] = 'Gem'
        #1 hydrogens (Trisubstituted)
        case 1:
            match h_order:
                #Hydrogen located in Q2
                case [0,1,0,0]:
                    ml_mol.attrib['_Alkene_Type'] = 'TriQ2'
                #Hydrogen located in Q3
                case [0,0,1,0]:
                    ml_mol.attrib['_Alkene_Type'] = 'TriQ3'
                #Hydrogen located in Q4
                case [0,0,0,1]:
                    ml_mol.attrib['_Alkene_Type'] = 'TriQ4'
        #0 hydrogens (Tetrasubstituted)
        case 0:
            ml_mol.attrib['_Alkene_Type'] = 'Tetra'

def reassign_and_realign(alk_mol: AlkeneRel, ml_mol: ml.Molecule, new_q1: ml.Atom) -> None:
    '''Reassigns the quadrants based on the new defined q1 and realigns based on this new assignment

    Parameters
    ----------
    alk_mol : AlkeneRel
        This allows for reassignment of each quadrant based on the q1 input
    ml_mol : ml.Molecule
        Original molecule object
    new_q1 : ml.Atom
        Atom to become the new quadrant

    '''
    # print(ml_mol)
    #Identifies True Q1 Atom and True C0 Atom
    true_q1a = new_q1
    true_c0 = ml_mol.get_atom(true_q1a.attrib['Arb C Con'])

    #Identifies True Q2 Atom and True C1 Atom
    true_q2a = alk_mol.cis_atom(true_q1a)
    true_c1 = ml_mol.get_atom(true_q2a.attrib['Arb C Con'])

    # Identifies True Q3 Atom and True Q4 Atoms
    true_q3a = alk_mol.trans_atom(true_q1a)
    true_q4a = alk_mol.gem_atom(true_q1a)
    
    #The following reassign attributes and alkene types based on the results above.
    true_c0.attrib['C0'] = ml_mol.get_atom_index(true_c0)
    true_c1.attrib['C1'] = ml_mol.get_atom_index(true_c1)

    true_q1a.attrib['Q'] = 1
    true_q2a.attrib['Q'] = 2
    true_q3a.attrib['Q'] = 3
    true_q4a.attrib['Q'] = 4
    
    q_atoms = (true_q1a, true_q2a, true_q3a, true_q4a)
    

    # print([ml_mol.get_atom_index(a) for a in q_atoms])
    if ml_mol.attrib['_Alkene_Type'] == 'Tri':
        h_count = [x.element for x in q_atoms].count(ml.Element.H)

        assert h_count == 1, f'Tri H count = {h_count}\n{m.name}\n{m.attrib["_Canonical_SMILES_H"]}'

        if true_q2a.element == ml.Element.H:
            ml_mol.attrib['_Alkene_Type'] = 'Tri_Q2'
        elif true_q3a.element == ml.Element.H:
            ml_mol.attrib['_Alkene_Type'] = 'Tri_Q3'
        elif true_q4a.element == ml.Element.H:
            ml_mol.attrib['_Alkene_Type'] = 'Tri_Q4'

    ml_mol.attrib['C Order'] = tuple(ml_mol.get_atom_index(x) for x in [true_c0, true_c1])
    ml_mol.attrib['Q Order'] = tuple(ml_mol.get_atom_index(x) for x in q_atoms)
    ml_mol.attrib['True Sterimol'] = {f'Q{i+1}':a.attrib['Sterimol'] for i,a in enumerate(q_atoms)}

    #The following code realigns based on the new values.

    #Sets Alkene Carbon C0 to be the origin
    set_origin(ml_mol, ml_mol.get_atom_index(true_c0))

    #Rotates molecule such that alkene atoms are along the X-axis (C0 --> C1)
    rot_rz(ml_mol, true_c0, true_c1)

    #Rotates molecule such that Q1 and alkene atoms are in the XY plane
    rot_xy(ml_mol, true_q1a, true_c0, true_c1)

    #This asserts that the vectors formed after alignment are correct and rotates 180 degrees if Q1 ends up in below C0.
    check_q_align(ml_mol, true_q1a, true_q2a, true_q3a)

    #This asserts some key aspects of the alkenes to ensure they are behaving correctly
    fix_alkene_types(ml_mol, true_q1a, true_q2a, true_q3a, true_q4a)


In [5]:
#Hydrogen Volume as calculated from the volume reference(5.57 Å^3)
vol_tolerance = 5.57
#Hydrogen ESP99th Percentile has a standard deviation of 3.54 kJ/mol across all hydrogen calculations (1508 calculations)
esp99_tolerance = 3.54
#Hydrogen Polarizability has a standard deviation of 0.09 C m^2 V^-1
pol_tolerance = 0.09

In [6]:
vol_types = ['MaxVol', '3BFSVol']
mlib = ml.MoleculeLibrary('6_6_2_DB_Merged_ESPFix.mlib')

for vol_type in vol_types:
    print(f'Realigning based on {vol_type}')
    realign_mlib = ml.MoleculeLibrary(f'6_7_Realign_{vol_type}.mlib', readonly=False, overwrite=True)
    
    rule_1 = 0
    rule_2 = 0
    rule_3 = 0
    rule_4 = 0
    rule_5 = 0
    rule_6 = 0
    no_change = 0

    with mlib.reading(), realign_mlib.writing():
        for name in tqdm(mlib):

            m = mlib[name]

            #Find Current Q atoms
            for a in m.atoms:
                if 'Q' in a.attrib:
                    qnum = a.attrib['Q']
                    match qnum:
                        case 1:
                            q1a = a
                        case 2:
                            q2a = a
                        case 3:
                            q3a = a
                        case 4:
                            q4a = a
            
            q_atoms = np.array([q1a, q2a, q3a, q4a])

            alk_rel = AlkeneRel(m, q1a, q2a, q3a, q4a)

    #### Rule 1: Substituent with the largest volume represented by the quadrants #####
            #Find all volumes based on current order of q_atoms
            if vol_type == 'MaxVol':
                all_vols = np.array([alk_rel.max_vol(a) for a in q_atoms])
            elif vol_type == '3BFSVol':
                all_vols = np.array([alk_rel.bfs_vol(a) for a in q_atoms])
            max_vol = np.max(all_vols)

            #Finds volumes that are within a tolerance
            comp_max_vol = np.where(all_vols > (max_vol - vol_tolerance))

            vol_match_atoms = q_atoms[comp_max_vol]

            #Ensures that there's only one max volume
            if vol_match_atoms.shape[0] == 1:
                match_atom = vol_match_atoms[0]
                rule_1 += 1
                reassign_and_realign(alk_rel, m, match_atom)
            else:
                
    #### Rule 2: Largest volume substituent with the smallest volume substituent cis#####
                
                #Finds the volumes that are cis substituted with respect to each other
                
                if vol_type == 'MaxVol':
                    cis_vols = np.array([alk_rel.max_vol(alk_rel.cis_atom(a)) for a in vol_match_atoms])
                elif vol_type == '3BFSVol':
                    cis_vols = np.array([alk_rel.bfs_vol(alk_rel.cis_atom(a)) for a in vol_match_atoms])
                min_cis_vol = np.min(cis_vols)
                #Finds minimum volumes that are within a tolerance
                comp_cis_vol = np.where(cis_vols < (min_cis_vol + vol_tolerance))

                
                cis_vol_match_atoms = vol_match_atoms[comp_cis_vol]
            
                #Tests to see if there is only one value in the array
                if cis_vol_match_atoms.shape[0] == 1:
                    match_atom = cis_vol_match_atoms[0]
                    reassign_and_realign(alk_rel, m, match_atom)
                    rule_2 += 1
                else: 
                    
    #### Rule 3: Largest Volume and Smallest 99th Percentile ESP substituent for each quadrant#####
                    
                    #Finds the 99th Percentile ESP that is the lowest (i.e. the most "electron rich" group) as a substitute for polarizability
                    extra = np.array([alk_rel.esp99(a) for a in cis_vol_match_atoms])
                    esp_99s = np.array([alk_rel.esp99(a) for a in cis_vol_match_atoms])

                    min_esp99 = np.min(esp_99s)

                    #Finds 99th Percentile ESP that are within a tolerance
                    comp_esp99s = np.where(esp_99s < (min_esp99 + esp99_tolerance))

                    esp99_match_atoms = cis_vol_match_atoms[comp_esp99s]

                    # Ensures that there's only one ESP 99th Percentile
                    if esp99_match_atoms.shape[0] == 1:
                        match_atom = esp99_match_atoms[0]
                        # print(m.get_atom_index(match_atom))
                        rule_3 += 1
                        reassign_and_realign(alk_rel, m, match_atom)
                        # print()
                        # print(m.dumps_xyz())
                    else:

    #### Rule 4: Largest Volume Substituent and Smallest 99th Percentile ESP substituent with the smallest 99th Percentile ESP substituent trans#####   

                        #Finds the 99th Percentile of ESP that are trans substituted with respect to each other
                        trans_esp99 = np.array([alk_rel.esp99(alk_rel.trans_atom(a)) for a in esp99_match_atoms])
                        # print(trans_esp99)
                        # print([m.get_atom_index(alk_rel.trans_atom(a)) for a in esp99_match_atoms])

                        min_trans_esp99 = np.min(trans_esp99)
                        #Finds minimum 99th Percentile of ESP that are within a tolerance
                        comp_trans_esp99 = np.where(trans_esp99 < (min_trans_esp99 + esp99_tolerance))

                        trans_esp99_match_atoms = esp99_match_atoms[comp_trans_esp99]

                        # Ensures that there's only one ESP 99th Percentile
                        if trans_esp99_match_atoms.shape[0] == 1:
                            match_atom = trans_esp99_match_atoms[0]
                            rule_4 += 1
                            reassign_and_realign(alk_rel, m, match_atom)
                        else:

    #### Rule 5: Largest Polarizability Atom with same previous criteria (volume, trans volume, small ESP99, small ESP99 trans)#####
                            #Finds the 99th Percentile ESP that is the lowest (i.e. the most "electron rich" group) as a substitute for polarizability
                            all_pols = np.array([alk_rel.pol(a) for a in trans_esp99_match_atoms])
                            
                            max_pol = np.max(all_pols)

                            #Finds polarizabilities that are within a tolerance
                            comp_max_pol = np.where(all_pols > (max_pol - pol_tolerance))

                            pol_match_atoms = trans_esp99_match_atoms[comp_max_pol]

                            if pol_match_atoms.shape[0] == 1:
                                match_atom = pol_match_atoms[0]
                                rule_5 += 1
                                reassign_and_realign(alk_rel, m, match_atom)
                            else:
    #### Rule 6: Largest Polarizability Atom Trans to the substituent with previous criteria#####
                                #Finds the 99th Percentile of ESP that are trans substituted with respect to each other
                                trans_pols = np.array([alk_rel.pol(alk_rel.trans_atom(a)) for a in pol_match_atoms])
                                max_trans_pol = np.max(trans_pols)
                                #Finds minimum 99th Percentile of ESP that are within a tolerance
                                comp_trans_pol = np.where(trans_pols < (max_trans_pol + pol_tolerance))

                                trans_pol_match_atoms = pol_match_atoms[comp_trans_pol]
                                if trans_pol_match_atoms.shape[0] == 1:
                                    match_atom = trans_pol_match_atoms[0]
                                    rule_6 += 1
                                    reassign_and_realign(alk_rel, m, match_atom)
                                else:
                                    #This takes the first atom in the maximum volume list and aligns it based on this arbitrary order
                                    match_atom = vol_match_atoms[0]
                                    reassign_and_realign(alk_rel, m, match_atom)
                                    no_change += 1   
            realign_mlib[name] = m

        print(f''' 
        rule 1: {rule_1}
        rule_2: {rule_2}
        rule_3: {rule_3}
        rule_4: {rule_4}
        rule_5: {rule_5}
        rule_6: {rule_6}
        no_change: {no_change}
        ''')

Realigning based on MaxVol


100%|██████████| 789/789 [00:02<00:00, 291.52it/s]


 
        rule 1: 661
        rule_2: 16
        rule_3: 73
        rule_4: 7
        rule_5: 11
        rule_6: 0
        no_change: 21
        
Realigning based on 3BFSVol


100%|██████████| 789/789 [00:02<00:00, 294.28it/s]


 
        rule 1: 542
        rule_2: 53
        rule_3: 151
        rule_4: 9
        rule_5: 8
        rule_6: 0
        no_change: 26
        


## Note about Alignment Scheme

For the alignment scheme here, the Q1 is placed in the top left quadrant of the alkene:

```
Q1           Q2
  \          /
    Q1C = Q2C
  /          \
Q4            Q3
```

The main body of work uses the same ordering of the quadrants, it is just flipped vertically to match the mnemonic during discussions:

```
Q4           Q3
  \          /
    Q1C = Q2C
  /          \
Q1            Q2
```

This does not change any of the conclusions or the actual descriptor calculation/alignment. It is just viewed at a different angle for the purposes of discussion.