# Map BEIS and MEGAN species to CRACMM

---
    author: Havala Pye
    date: 2024-08-08
---

Identify the CRACMM species for each BEIS/MEGAN species using the mapper. The cracmm_mapper function depends on [rdkit](https://www.rdkit.org/).

## Setup

In [1]:
import pandas as pd
from datetime import datetime, timedelta, date
import rdkit as rdkit
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import SimilarityMaps
import numpy as np

In [2]:
## Install rdkit if not already installed

# !python -m pip install --user rdkit

# to install in the current kernel:
# %pip install rdkit

In [3]:
# set location of mapper downloaded from https://github.com/USEPA/CRACMM/
import sys
utildir = '/work/MOD3DEV/has/2023cracmm_ages/structurecuration/utilities/'   
sys.path.append(utildir)

# Import the python utilities
import cracmm_mapper as cracmm2   # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 2)

In [4]:
datadir = '/work/MOD3DEV/has/2023cracmm_ages/structurecuration/data/'    # data files of mappings

In [5]:
pd.set_option('display.max_rows', None)

### BEIS
input beis mapping from https://github.com/USEPA/CRACMM/tree/main/emissions/BiogenicMappings

In [7]:
filename = datadir + 'bvoc_beis_tocracmm2alpha.csv' 
dfbeis = pd.read_csv(filename)

# check mappings vs mapper, differences will be printed to screen
for idx, row in dfbeis.iterrows():    
    query_smiles = str(row['SMILES'])
    koh = row['ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED']
    log10cstar = row['log10Cstar_ugm3']
    roc = cracmm2.get_cracmm_roc(query_smiles,koh,log10cstar,phase='gas')
    if row['CRACMM2']!=roc: print(row['SPECIES_NAME'],row['CRACMM2'], roc) # print species that don't match file
    dfbeis.loc[idx,'CRACMM2'] = roc
dfbeis.to_csv(datadir+'bvoc_beis_tocracmm.csv', sep=',', na_rep='', float_format=None, columns=None, header=True, index=False)

para-cymene ROCP6ARO VROCP6ARO
carbon monoxide SLOWROC UNKCRACMM




### MEGAN
input megan mapping from https://github.com/USEPA/CRACMM/tree/main/emissions/BiogenicMappings

In [8]:
filename = datadir + 'bvoc_megan_tocracmm2alpha.csv'
dfmegan = pd.read_csv(filename)
dfmegan

# check mappings vs mapper, differences will be printed to screen
for idx, row in dfmegan.iterrows():    
    query_smiles = str(row['SMILES'])
    koh = row['ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED']
    log10cstar = row['log10Cstar_ugm3']
    roc = cracmm2.get_cracmm_roc(query_smiles,koh,log10cstar,phase='gas')
    if row['CRACMM2']!=roc: print(row['REPRESENTATIVE_COMPOUND_NAME'],row['CRACMM2'], roc) # print species that don't match file
    dfmegan.loc[idx,'CRACMM2'] = roc
dfmegan.to_csv(datadir+'bvoc_megan_tocracmm.csv', sep=',', na_rep='', float_format=None, columns=None, header=True, index=False)

p-Cymene ROCP6ARO VROCP6ARO
Estragole ROCP6ARO VROCP6ARO
beta-Ionone ROCP6ARO VROCP6ARO
1-Octen-3-ol ROCP6ARO VROCP6ARO
Farnesol ROCP5ARO VROCP5ARO
cis-Nerolidol ROCP5ARO VROCP5ARO
trans-Nerolidol ROCP5ARO VROCP5ARO
2-Ethylhexyl salicylate ROCP2OXY2 VROCP2OXY2
(-)-alpha-Cadinol ROCP5ARO VROCP5ARO
(+)-Cedrol ROCP5OXY1 VROCP5OXY1
3,3,5-Trimethylcyclohexyl salicylate ROCP1OXY1 VROCP1OXY1
(-)-Kaur-16-ene ROCP5ARO VROCP5ARO
(+)-Longicyclene ROCP4ALK VROCP4ALK
(E)-6,10-Dimethylundeca-5,9-dien-2-one ROCP6ARO VROCP6ARO
(E)-6,10-Dimethylundeca-5,9-dien-2-one ROCP6ARO VROCP6ARO
6,10-Dimethyl-5,9-undecadiene-2-one ROCP6ARO VROCP6ARO
2-Nonenal ROCP6ARO VROCP6ARO
7-heptadecene ROCP5ARO VROCP5ARO
Acetophenone ROCP6ARO VROCP6ARO
Benzyl benzoate ROCP5ARO VROCP5ARO
Benzyl acetate ROCP6ARO VROCP6ARO
Cinnamic acid ROCP2OXY2 VROCP2OXY2
Coniferyl alcohol ROCP1OXY3 VROCP1OXY3
Ethyl cinnamate ROCP5ARO VROCP5ARO
Jasmone ROCP6ARO VROCP6ARO
Linalool oxide pyranoid, cis-(+-)- ROCP5ARO VROCP5ARO
Linalool oxide py

