# A notebook to demonstrate the different kinetics sources for uncertainty\

In [1]:
import os
import re
import rmgpy.chemkin
import rmgpy.data.rmg
import rmgpy.data.kinetics.family

import importlib
importlib.reload(rmgpy.data.kinetics.family)

<module 'rmgpy.data.kinetics.family' from '/home/moon/rmg/RMG-Py/rmgpy/data/kinetics/family.py'>

In [2]:
# pick an annotated chemkin file to analyze
chemkin_file = '/home/moon/uncertainty_estimator/uncertainty_tool_dev/ethane_limit_families/chemkin/chem_annotated.inp'
dict_file = '/home/moon/uncertainty_estimator/uncertainty_tool_dev/ethane_limit_families/chemkin/species_dictionary.txt'
species_list, reaction_list = rmgpy.chemkin.load_chemkin_file(chemkin_file, dict_file)

In [3]:
database = rmgpy.data.rmg.RMGDatabase()
thermo_libraries = [
    'primaryThermoLibrary',
    'BurkeH2O2'
]
reaction_libraries = [
    'BurkeH2O2inN2'
]
kinetics_families = [
    'Disproportionation',
    'H_Abstraction',
    'intra_H_migration',
    'R_Recombination',
    'Intra_Disproportionation',
]

database.load(
    path = rmgpy.settings['database.directory'],
    thermo_libraries = thermo_libraries,
    transport_libraries = [],
    reaction_libraries = reaction_libraries,
    seed_mechanisms = [],
    kinetics_families = kinetics_families,
    kinetics_depositories = ['training'],
    depository = False, # Don't bother loading the depository information, as we don't use it
)


for family in database.kinetics.families:
    if not database.kinetics.families[family].auto_generated:
        database.kinetics.families[family].add_rules_from_training(thermo_database=database.thermo)
        database.kinetics.families[family].fill_rules_by_averaging_up(verbose=True)


In [4]:
sources = {}

for i in range(len(reaction_list)):
    if not hasattr(reaction_list[i], 'family'):
        continue
    family = reaction_list[i].family
    src = database.kinetics.families[family].extract_source_from_comments(reaction_list[i])
    sources[i] = src

### Exact match for training reaction in family

In [5]:
# Exact match for training reaction
# for i in range(len(reaction_list)):
for i in range(5):
    if i not in sources.keys():
        continue
    
    src = sources[i]
    exact_training_match = src[0]
    family = src[1][0]
    if exact_training_match:
        print(i, reaction_list[i].kinetics.comment)
        print()


0 Matched reaction 9 CH3 + CH3 <=> C2H6 in R_Recombination/training
This reaction matched rate rule [Root_N-1R->H_N-1CNOS->N_N-1COS->O_1CS->C_N-1C-inRing]
family: R_Recombination

1 Matched reaction 215 C2H6 + CH3_r3 <=> C2H5b + CH4 in H_Abstraction/training
This reaction matched rate rule [C/H3/Cs\H3;C_methyl]
family: H_Abstraction

2 Matched reaction 10 CH3 + C2H5 <=> C3H8 in R_Recombination/training
This reaction matched rate rule [Root_N-1R->H_N-1CNOS->N_N-1COS->O_1CS->C_N-1C-inRing_Ext-2R-R_Sp-3R!H-2R_3R!H->C_2R->C]
family: R_Recombination

3 Matched reaction 5 CH3_r1 + C2H5 <=> CH4 + C2H4 in Disproportionation/training
This reaction matched rate rule [Root_N-4R->H_4CNOS-u1_N-1R!H->O_N-4CNOS->O_4CNS->C_1CNS->C_Sp-2R!H-1C_2R!H->C]
family: Disproportionation

4 Matched reaction 6 C2H5 + C2H5-2 <=> C2H6 + C2H4 in Disproportionation/training
This reaction matched rate rule [Root_N-4R->H_4CNOS-u1_N-1R!H->O_N-4CNOS->O_Ext-4CNS-R_N-Sp-5R!H#4CCCNNNSSS_N-2R!H->S_N-5R!H->O_Sp-5CS-4CCNSS_1CN

### Autogenerated family node

In [6]:
# Exact match for training reaction
# for i in range(len(reaction_list)):
for i in range(25):
    if i not in sources.keys():
        continue
    
    src = sources[i]
    exact_training_match = src[0]
    if exact_training_match:
        continue
    
    if src[1][1]['node']:
        print(i, reaction_list[i].kinetics.comment)
        print()


8 Estimated from node Root_N-4R->H_4CNOS-u1_N-1R!H->O_N-4CNOS->O_Ext-4CNS-R_N-Sp-5R!H#4CCCNNNSSS_N-2R!H->S_N-5R!H->O_Sp-5CS-4CCNSS_1CNS->C_Ext-5CS-R
Multiplied by reaction path degeneracy 3.0

9 Estimated from node Root_N-1R->H_N-1CNOS->N_N-1COS->O_1CS->C_N-1C-inRing_Ext-2R-R_Ext-3R!H-R_N-Sp-3R!H=2R

22 Estimated from node Root_Ext-1R!H-R_N-4R->O_N-Sp-5R!H=1R!H_Ext-4CHNS-R_N-6R!H->S_4CHNS->C_N- Sp-6BrBrBrCCCClClClFFFIIINNNOOOPPPSiSiSi#4C_6BrCClFINOPSi->C_N-1R!H-inRing_Sp-6C-4C_Ext-6C-R
Multiplied by reaction path degeneracy 2.0

23 Estimated from node Root_Ext-2R!H-R_2R!H->C_4R->C
Multiplied by reaction path degeneracy 6.0

24 Estimated from node Root_Ext-2R!H-R_2R!H->C_4R->C
Multiplied by reaction path degeneracy 6.0



### Exact match for a family's rate rule

In [7]:
# Exact match for training reaction -- this is how you get 
for i in range(len(reaction_list)):
    if i not in sources.keys():
        continue
    
    src = sources[i]
    exact_training_match = src[0]
    if exact_training_match:
        continue
    exact_rule_match = src[1][1]['exact']
    family = src[1][0]
    if exact_rule_match:
        assert src[1][1]['rules']
        print(i, reaction_list[i].kinetics.comment)
        print()


33 From training reaction 114 used for C/H3/Cs;C_methyl
Exact match found for rate rule [C/H3/Cs;C_methyl]
Euclidian distance = 0
Multiplied by reaction path degeneracy 3.0
family: H_Abstraction

62 From training reaction 1566 used for Cd/H2/NonDeC;C_methyl
Exact match found for rate rule [Cd/H2/NonDeC;C_methyl]
Euclidian distance = 0
Multiplied by reaction path degeneracy 2.0
family: H_Abstraction

63 From training reaction 343 used for Cd/H2/NonDeC;C_rad/H2/Cs\H3
Exact match found for rate rule [Cd/H2/NonDeC;C_rad/H2/Cs\H3]
Euclidian distance = 0
Multiplied by reaction path degeneracy 2.0
family: H_Abstraction

66 From training reaction 1567 used for Cd/H2/NonDeC;C_rad/H/Cs\H3/Cs\H3
Exact match found for rate rule [Cd/H2/NonDeC;C_rad/H/Cs\H3/Cs\H3]
Euclidian distance = 0
Multiplied by reaction path degeneracy 2.0
family: H_Abstraction

67 From training reaction 177 used for Cd/H2/NonDeC;Cd_Cd\H2_pri_rad
Exact match found for rate rule [Cd/H2/NonDeC;Cd_Cd\H2_pri_rad]
Euclidian distanc

### Combination of rate rules

#### Estimated using template

This means the combination of groups does not exist as a rule, so we fell up one of the group subtrees to more generic groups until we found a rule that exists

For example, there is no rate rule for [C/H3/Cs\H2\Cs;C_methyl], so you have to fall back to the more generic [C/H3/Cs\OneNonDe;C_methyl] rule, which has data (only because it averaged up the nodes below it).

In [47]:
for i in range(len(reaction_list)):
    if 'Estimated using template' in reaction_list[i].kinetics.comment:
        print(i, reaction_list[i].kinetics.comment)
        family = sources[i][1][0]
        rule_entry_name = ';'.join([template.label for template in sources[i][1][1]['template']])
        assert rule_entry_name not in database.kinetics.families[reaction_list[i].family].rules.entries
        print()

6 Estimated using template [C/H3/Cs\OneNonDe;C_methyl] for rate rule [C/H3/Cs\H2\Cs;C_methyl]
Euclidian distance = 1.0
Multiplied by reaction path degeneracy 6.0
family: H_Abstraction

36 Estimated using template [C/H3/Cs;C_rad/H2/Cs] for rate rule [C/H3/Cs\H2\Cs;C_rad/H2/Cs]
Euclidian distance = 2.0
Multiplied by reaction path degeneracy 12.0
family: H_Abstraction

65 Estimated using template [C/H3/Cs\H2\Cs;Cd_rad] for rate rule [C/H3/Cs\H2\Cs;Cd_Cd\H\Cs_pri_rad]
Euclidian distance = 2.0
Multiplied by reaction path degeneracy 6.0
family: H_Abstraction



In [49]:
database.kinetics.families[reaction_list[6].family].rules.entries['C/H3/Cs\OneNonDe;C_methyl'][0].data

ArrheniusEP(A=(0.666667,'cm^3/(mol*s)'), n=3.57, alpha=0, E0=(32287.9,'J/mol'), comment="""Average of [From training reaction 232 used for C/H3/Cs\H2\O;C_methyl]""")

#### Estimated using average of templates

This means the combination of groups does not match a rule entry, and we had to use more generic groups on both subtrees. The data is an average of the more generic rules matched on each subtree.

In [50]:
for i in range(len(reaction_list)):
    if 'Estimated using average of templates' in reaction_list[i].kinetics.comment:
        print(i, reaction_list[i].kinetics.comment)
        print()

18 Estimated using average of templates [C/H3/Cs;Cd_Cd\H2_pri_rad] + [C/H3/Cs\H2\Cs;Cd_rad] for rate rule [C/H3/Cs\H2\Cs;Cd_Cd\H2_pri_rad]
Euclidian distance = 2.0
Multiplied by reaction path degeneracy 6.0
family: H_Abstraction



#### Estimated using an average for rate rules

This means that the original groups being queried exist as a rule on the tree, but that rule only has data because it averaged the results from other nodes. There are no training reactions matched most specifically to the given template.

In [58]:
for i in range(len(reaction_list)):
    if 'Estimated using an average for rate rule' in reaction_list[i].kinetics.comment:
        template_name = ';'.join([template.label for template in sources[i][1][1]['template']])
        for j in range(len(database.kinetics.families[reaction_list[i].family].rules.entries[template_name])):
            assert 'Average of' in database.kinetics.families[reaction_list[i].family].rules.entries[template_name][j].data.comment

        print(i, reaction_list[i].kinetics.comment)
        print()

7 Estimated using an average for rate rule [C/H3/Cs\H3;C_rad/H2/Cs]
Euclidian distance = 0
Multiplied by reaction path degeneracy 6.0
family: H_Abstraction

14 Estimated using an average for rate rule [C/H2/Cs\H3/Cs\H3;C_rad/H2/Cs]
Euclidian distance = 0
Multiplied by reaction path degeneracy 2.0
family: H_Abstraction

34 Estimated using an average for rate rule [C/H3/Cs\H3;C_rad/H2/Cs]
Euclidian distance = 0
Multiplied by reaction path degeneracy 12.0
family: H_Abstraction

37 Estimated using an average for rate rule [C/H2/Cs\H3/Cs\H3;C_rad/H2/Cs]
Euclidian distance = 0
Multiplied by reaction path degeneracy 4.0
family: H_Abstraction

38 Estimated using an average for rate rule [R2radExo;Y_rad;XH_Rrad]
Euclidian distance = 0
Multiplied by reaction path degeneracy 4.0
family: Intra_Disproportionation

44 Estimated using an average for rate rule [C/H3/Cd\H_Cd\H2;C_rad/H2/Cs]
Euclidian distance = 0
Multiplied by reaction path degeneracy 3.0
family: H_Abstraction

55 Estimated using an av

In [30]:
# Any other "estimated using"
for i in range(len(reaction_list)):
    if 'Estimated using ' in reaction_list[i].kinetics.comment:
        print(i, reaction_list[i].kinetics.comment)
        print()

6 Estimated using template [C/H3/Cs\OneNonDe;C_methyl] for rate rule [C/H3/Cs\H2\Cs;C_methyl]
Euclidian distance = 1.0
Multiplied by reaction path degeneracy 6.0
family: H_Abstraction

7 Estimated using an average for rate rule [C/H3/Cs\H3;C_rad/H2/Cs]
Euclidian distance = 0
Multiplied by reaction path degeneracy 6.0
family: H_Abstraction

14 Estimated using an average for rate rule [C/H2/Cs\H3/Cs\H3;C_rad/H2/Cs]
Euclidian distance = 0
Multiplied by reaction path degeneracy 2.0
family: H_Abstraction

18 Estimated using average of templates [C/H3/Cs;Cd_Cd\H2_pri_rad] + [C/H3/Cs\H2\Cs;Cd_rad] for rate rule [C/H3/Cs\H2\Cs;Cd_Cd\H2_pri_rad]
Euclidian distance = 2.0
Multiplied by reaction path degeneracy 6.0
family: H_Abstraction

34 Estimated using an average for rate rule [C/H3/Cs\H3;C_rad/H2/Cs]
Euclidian distance = 0
Multiplied by reaction path degeneracy 12.0
family: H_Abstraction

36 Estimated using template [C/H3/Cs;C_rad/H2/Cs] for rate rule [C/H3/Cs\H2\Cs;C_rad/H2/Cs]
Euclidian dis