# Map SPECIATE species to CRACMM

---
    author: Nash Skipper
    date: 2024-08-09
---

Identify the CRACMM species for each SPECIATE species using the mapper. The cracmm_mapper function depends on [rdkit](https://www.rdkit.org/).

## Setup

In [1]:
import numpy as np
import pandas as pd
import warnings

In [2]:
## Install rdkit if not already installed

# !python -m pip install --user rdkit

# to install in the current kernel:
# %pip install rdkit

In [3]:
# set location of mapper downloaded from https://github.com/USEPA/CRACMM/
import sys
utildir = '/work/MOD3DEV/tskipper/cracmm_hcho/CRACMM_REPO/emissions/cracmm2/'   
sys.path.append(utildir)

# Import the python utilities
import cracmm_mapper as cracmm2   # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 2)

In [4]:
datadir = '/work/MOD3DEV/tskipper/cracmm_hcho/cracmm2_spc_map/'    # data files of mappings

outputdir = datadir

In [5]:
pd.set_option('display.max_rows', None)
pd.options.mode.copy_on_write = True
warnings.simplefilter('ignore') # ignore warnings (comment out to see warnings for species that could not be mapped)
csvout_kw = dict(sep=',', na_rep='', float_format=None, columns=None, header=True, index=False)

## SPECIATE mapping

input SPECIATE file from https://github.com/USEPA/CRACMM/tree/main/emissions/SPECIATEInputs

In [6]:
filename = datadir + 'SPECIATEv5.2x_fromCRACMM2alpha.csv' 
df = pd.read_csv(filename)
# for checking if any species mapping changed
orig_map_colname = 'CRACMM2alpha'
df = df.rename(columns=dict(CRACMM2=orig_map_colname))

### calculate C* from vapor pressure

$$ C^* \text{must have units of } {\mu g \over m^3} $$
$$ C^* = {p * M * 10^6 \over R * T} $$
$$ \text{p and M are from the input csv file} $$
$$ p = \text{vapor pressure} \left( Pa \right) $$
$$ M = \text{molecular weight} \left( g \over mol\right ) $$
$$ R = 8.314 \text{ } {m^3 * Pa \over mol * K} $$
$$ T = 298 \text{ } K $$

In [7]:
vp_k = 'VP_Pascal_OPERA'
mw_k = 'SPEC_MW'
R = 8.314
T = 298
df['log10Cstar_ugm3'] = np.log10(df[vp_k] * df[mw_k]*10**6 / (R * T))

### run cracmm2 mapper

In [8]:
smiles_k = 'Smiles Notation'
koh_k    = 'ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA'
cstar_k  = 'log10Cstar_ugm3'
df['CRACMM2'] = df.apply(lambda x: cracmm2.get_cracmm_roc(x[smiles_k], x[koh_k], x[cstar_k]), axis=1)

# check if any species mappings changed
df_checkmatch = df.eval(f'match = {orig_map_colname}==CRACMM2')
show_cols = ['SPECIES_NAME',orig_map_colname,'CRACMM2']
if len(df_checkmatch[df_checkmatch.match==False])>0:
    print(f'the species mappings below changed from {orig_map_colname}')
    display(df_checkmatch[show_cols][df_checkmatch.match==False])
else:
    print(f'all species matched {orig_map_colname} mapping')

# save output
df = df.drop(columns=[orig_map_colname, cstar_k])
df.to_csv(outputdir+'SPECIATEv5.2x_fromCRACMM.csv', **csvout_kw)

the species mappings below changed from CRACMM2alpha


Unnamed: 0,SPECIES_NAME,CRACMM2alpha,CRACMM2
856,Carbon monoxide,SLOWROC,UNKCRACMM
857,Carbon dioxide,SLOWROC,UNKCRACMM


## which species mappings changed from CRACMM1 to CRACMM2 mapper?

In [9]:
display_changes = False
df_checkmatch = df.eval('match = CRACMM1==CRACMM2')
show_cols = ['SPECIES_NAME','CRACMM1','CRACMM2']
if len(df_checkmatch[df_checkmatch.match==False])>0:
    print(f'there are {len(df_checkmatch[df_checkmatch.match==False])} changes')
    print(f'out of {len(df_checkmatch)} total species')
    if display_changes:
        print('the species mappings below changed from CRACMM1')
        display(df_checkmatch[show_cols][df_checkmatch.match==False])
    else:
        print('set display_changes to True to see them all')
else:
    print('all species matched CRACMM1 mapping')

there are 1382 changes
out of 2890 total species
set display_changes to True to see them all


---