# Find CRACMM species based on SMILES

---
    author: Nash Skipper
    date: 2024-02-14
---

This notebook provides examples of how to use the cracmm_mapper tool.

In [1]:
import pandas as pd
from cracmm_mapper import get_cracmm_roc

## Install rdkit if not already installed

The cracmm_mapper function depends on [rdkit](https://www.rdkit.org/).

In [2]:
# !python -m pip install --user rdkit

# to install in the current kernel:
# %pip install rdkit

## Option 1: Interactively enter species properties

In [3]:
smiles = str(input('enter SMILES:  '))
kOH = float(input('enter kOH (cm3 molecules-1 s-1):  '))
log10Cstar = float(input('enter log10Cstar (Cstar in ug/m3):  '))
print(f'CRACMM species:  {get_cracmm_roc(smiles, kOH, log10Cstar)}')

enter SMILES:   C=CC1=CC=CC=C1
enter kOH (cm3 molecules-1 s-1):   5.79e-11
enter log10Cstar (Cstar in ug/m3):   7.55


CRACMM species:  STY


## Option 2: Same as option 1 but not interactive

In [4]:
smiles = 'C=CC1=CC=CC=C1'
kOH = 5.79e-11      # [cm3/(molecule*s)]
log10Cstar = 7.55   # [Cstar in ug/m3]
get_cracmm_roc(smiles, kOH, log10Cstar)

'STY'

## Option 3: Run multiple species in batch

### Create a pandas DataFrame with species properties

This is a simple example for demonstration. A more typical application would be to have a csv or excel file containing the SMILES string, kOH, and log10(Cstar) which can be used to create the DataFrame instead.

In [5]:
data = {
    'species':    ['styrene', 'cyclohexane', 'glyoxal'],
    'SMILES':     ['C=CC1=CC=CC=C1', 'C1CCCCC1', 'O=CC=O'],
    'koh':        [5.79e-11, 7.48e-12, 1.14e-11], # [cm3/(molecule*s)]
    'log10cstar': [7.55, 8.64, 8.90] # [Cstar in ug/m3]
}
df = pd.DataFrame(data)
df

Unnamed: 0,species,SMILES,koh,log10cstar
0,styrene,C=CC1=CC=CC=C1,5.79e-11,7.55
1,cyclohexane,C1CCCCC1,7.48e-12,8.64
2,glyoxal,O=CC=O,1.14e-11,8.9


### Add column for CRACMM species from cracmm_mapper

In [6]:
df['CRACMM'] = df.apply(lambda x: get_cracmm_roc(x['SMILES'], x['koh'], x['log10cstar']), axis='columns')
df

Unnamed: 0,species,SMILES,koh,log10cstar,CRACMM
0,styrene,C=CC1=CC=CC=C1,5.79e-11,7.55,STY
1,cyclohexane,C1CCCCC1,7.48e-12,8.64,HC10
2,glyoxal,O=CC=O,1.14e-11,8.9,GLY
