# Analysis of Latent Space Reaction Attribute Vector for Ketone Reduced to Alcohol

_Prepared by: Chih-Wei Hsu, Liang Xu, Christine Chang_

### Ketone $(R=O)$ $\rightarrow$ Alcohol $(R-OH)$

### Goal

Our goal is to evaluate the encoding of chemical reactions in DarkChem's latent space.  To do this, we are exploring the possibility of attribute vector arithmetic, e.g.

$\Big( \overrightarrow{product_{1}} - \overrightarrow{reactant_{1}} \Big) = \overrightarrow{reaction} \stackrel{?}{=} \Big( \overrightarrow{product_{2}} - \overrightarrow{reactant_{2}} \Big)$

### Approach

Our approach differs from prior analysis in the following ways:

   1. A **larger testing set** was used, consisting of more than 100 molecules (versus 10-25 used in prior analyses), which should yield a more robust reaction vector.
   2. A **similar class of reactants** was evaluated for the same reaction type.  For instance, in this analysis only the reactions that reduce the Carbonyl (R=O) to Hydroxyl (R-OH) are chosen.
   3. A **more detailed class of reactants** was classified from even the same reaction type. For example, reaction vectors of reactants with and without benzene are determiend seperately.
   
### Key Findings

* The analysis of all reactants (without classification) shows a high std deviation of the angle, so the average reaction vector is not quite reliable. 
    * The std deviation of magnitude and angle are 0.8465 and  9.4492 respectively. 
    * There's only 1 prediction is successful, 6 reactions are apart from 1 carbon, and the rest of 3 are failed.
* Therefore, we classify all 100 reactions into several subgroup,such as linear reactants without branch, with one branch and with benzene.
    * As for linear reactants, the std deviation of the angle does become lower, but the result of the predition is not quite good, for reactant without branch, 4 of 6 predictions are apart from 1 carbon, and the rest are failed.
    * As for benzene group, the std deviation of the angle is still high, but the interesting thing is, all the prediction are 1 carbon apart from the actual product.

#### Unclassified Reactants

* Training reactions: 106
* Testing reactions: 10

Magnitude and angle from training reactions:

index|            Magnitude|               Angle|
-----|---------------------|--------------------|
  Avg|   2.4788428243223253|   24.72588508743866|
  Std|   0.8465002962903085|   9.449205712141126|


Predictions of testing reactions:

index|             Reactant|             Product|The closest predicted product|Accuracy|
-----|---------------------|--------------------|-----------------------------|--------|
    1|     CCC1CC(=O)CCN1CC|     CCC1CC(O)CCN1CC|-----------------------------|  Failed|
    2|      CCCOC(=O)C(=O)C|      CCCOC(=O)C(O)C|              CCCOC(=O)C(CO)C|      +C|
    3|       O=C1C(C)CCCC1C|       CC1CCCC(C1O)C|               OCC1C(C)CCCC1C|      +C|
    4|       CCCCCCCCC(=O)C|       CCCCCCCCC(O)C|               CCCCCCCCC(CO)C|      +C|
    5|   N#Cc1cccc2c1CCC2=O|   N#Cc1cccc2c1CCC2O|            N#Cc1cccc2c1CCC2O|      OK|
    6|  CCOC(=O)C(C(=O)C)CC|  CCOC(=O)C(C(O)C)CC|-----------------------------|  Failed|
    7|   CCC(C(=O)OC)C(=O)C|   CCC(C(=O)OC)C(O)C|           CCC(C(=O)OC)C(CO)C|      +C|
    8|       CCn1cc(nc1)C=O|       OCc1ncn(c1)CC|-----------------------------|  Failed|
    9|    O=C1OCCC2(C1)CCC2|    OC1OCCC2(C1)CCC2|            OCC1OCCC2(C1)CCC2|      +C|
   10|O=C1CCc2c(C1)c(N)ccc2|OC1CCc2c(C1)c(N)ccc2|        OCC1CCc2c(C1)c(N)ccc2|      +C|

#### Linear reactants without branch

* Training reactions: 9
* Testing reactions: 6

index|            Magnitude|               Angle|
-----|---------------------|--------------------|
  Avg|   1.3811446329554538|   12.82322654293219|
  Std|   0.2176468472896574|   1.893786152376530|


Predictions of testing reactions:

index|             Reactant|             Product|The closest predicted product|Accuracy|
-----|---------------------|--------------------|-----------------------------|--------|
    1|           CCCC(=O)CC|           CCCC(CC)O|                   CCCC(CO)CC|      +C|
    2|        C/C=C/C(=O)CC|        C/C=C/C(CC)O|                C/C=C/C(CO)CC|      +C|
    3|       C/C=C/C(=O)C#N|       C/C=C/C(C#N)O|-----------------------------|  Failed|
    4|          O=CCC/C=C/C|          C/C=C/CCCO|-----------------------------|  Failed|
    5|            CCCC#CC=O|            CCCC#CCO|                    CCCC#CCCO|      +C|
    6|           CCNCC(=O)C|           CCNCC(O)C|                   CCNCC(CO)C|      +C|

#### Reactants with benzene

* Training reactions: 31
* Testing reactions: 5

index|            Magnitude|               Angle|
-----|---------------------|--------------------|
  Avg|   2.5209699069690226|    25.6766847514662|
  Std|   0.7087247831577531|     9.3044568917626|


Predictions of testing reactions:

index|             Reactant|             Product|The closest predicted product|Accuracy|
-----|---------------------|--------------------|-----------------------------|--------|
    1|         O=CCc1ccccc1|         OCCc1ccccc1|                 OCCCc1ccccc1|      +C|
    2| O=CCc1cccc(c1)C(=O)C| OCCc1cccc(c1)C(=O)C|         OCCCc1cccc(c1)C(=O)C|      +C|
    3|     CCc1ccccc1C(=O)C|     CCc1ccccc1C(O)C|             OCC(c1ccccc1CC)C|      +C|
    4|   CCc1ccc(cc1)C(=O)C|   CCc1ccc(cc1)C(O)C|           OCC(c1ccc(cc1)CC)C|      +C|
    5|      CCc1ccc(cc1)C=O|      CCc1ccc(cc1)CO|              OCCc1ccc(cc1)CC|      +C|