### ThemoPINN

See [GitHub Source Code Repository](https://github.com/BYU-PRISM/ThermoPINN)

Knotts, T., Hedengren, J.D., Babaei, M.R., Physics-Informed Deep Learning for Prediction of Thermophysical Properties: The Parachor Method for Surface Tension, AIChE Annual Meeting, Phoenix, AZ, Nov 13-18, 2022.

Chemical thermophysical properties are needed for chemical handling, design of production and storage facilities, separations, and manufacture. The Design Institute for Physical Properties (DIPPR) was created in 1978 (under the direction of the American Institute of Chemical Engineers) and is the best source of critically evaluated thermophysical, safety, and environmental properties. When experimental data for a chemical is not available, DIPPR predicts the values for properties of that chemical. Creating more accurate and broadly applicable prediction methods for thermophysical properties is an area of constant research. One example of a relevant thermophysical property is normal boiling point (NBP).
A set of 1600 compounds is utilized for training, validating, and testing a Physics-Informed Neural Network (PINN) to improve Normal Boiling Point (NBP) based on the group contribution methods [1]. Physics-informed deep learning seeks to improve predictive accuracy by incorporating physics-based information with machine learning. Standard artificial neural networks have a known weakness to extrapolation potential when used outside the training region. An artificial neural network not only produces a prediction, but also a self-assessment of uncertainty. The contributions to the parachor should be strictly additive in nature as it represents a volume of space a molecule occupies. However, the approaches mentioned above have several groups which are negative, suggesting a suboptimal optimization relating parachor to the groups. We recently combined machine learning (ML) with a physics-based constraint to achieve better predictions that any previous method for surface tension [2]. This presentation outlines additional progress with NBP with a comparison to other leading prediction methods. For compounds with Tb > 600 K, the PINN model yields 24.3◦C Mean Absolute Error (3.6% Mean Percentage Error), while this value for the Joback method is about 79.0◦C MAE (or 12% MPE). Across the results, predicted NPB for compounds containing silane and imine families are less accurate. NBP results further demonstrate that physics-based constraints with machine learning produce significant improvements in prediction methods for the thermophysical properties that are crucial in the field of chemical engineering.

References Cited
1. Ericksen, Wilding, W.V., Oscarson, J.L., and Rowley, R.L., Use of the DIPPR Database for Development of QSPR Correlations: Normal Boiling Point, J. Chem. Eng. Data 2002, 47, 5, 1293–1302, July, 2002 DOI: 10.1021/je0255372
2. Knotts, T., Hedengren, J.D., Babaei, M.R., Physics-Informed Deep Learning for Prediction of Thermophysical Properties: The Parachor Method for Surface Tension, AIChE Annual Meeting, Phoenix, AZ, Nov 13-18, 2022.

### Launch in Binder

<a href='https://mybinder.org/v2/gh/BYU-PRISM/ThermoPINN/main?urlpath=voila%2Frender%2Fnotebook.ipynb'><img align=left width=100px src='https://camo.githubusercontent.com/581c077bdbc6ca6899c86d0acc6145ae85e9d80e6f805a1071793dbe48917982/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667'></a>

Install `thermo`, `rdkit`, `ipyvuetify`, and `scikit-learn` (version <=1.2.0)

In [1]:
#pip install thermo

In [None]:
#pip install rdkit

In [None]:
#pip install ipyvuetify

In [None]:
#pip install scikit-learn=1.2.0

In [2]:
import pickle 
import numpy as np
import pandas as pd
import ipyvuetify as v

import warnings
warnings.filterwarnings("ignore")
from thermo import Joback

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
import tensorflow as tf

In [3]:
def nbp_calc(smiles, family):
    with open('hasher.pkl', 'rb') as file:
        fh = pickle.load(file=file)
    
    with open('params_Final.pkl', 'rb') as file:
        min_, max_ = pickle.load(file)

    model = tf.keras.models.load_model('PINN_Final.h5')

    smiles_code = smiles
    J = Joback(smiles_code).counts

    fga = np.zeros(40)
    for key in J:
        fga[key-1] = J[key]

    family_val = fh.transform(np.array([family])).toarray()

    X = np.array(family_val.tolist()[0] + fga.tolist())
    x = (X-min_[:-1])/(max_[:-1] - min_[:-1]) 
    
    yp = model(x.to_numpy()[None, :])[0]
    Yp = yp*(max_[-1] - min_[-1]) + min_[-1]
    
    return Yp[0].numpy()


In [4]:
df_smiles = pd.read_csv('smf.csv')
df_smiles.drop(columns='Unnamed: 0', inplace=True)
df_family = df_smiles.drop_duplicates(subset='Family')[['Family']]

In [5]:
def guess_family(item, *args):
    if item.v_model is None:
        val = ['']
    elif type(item.v_model) == list:
        val = item.v_model[0]
    else:
        val = item.v_model

    if val in df_smiles['smiles'].values:
        family_field.v_model = df_smiles.loc[df_smiles['smiles']==val]['Family'].to_numpy().tolist()
    
    update()
    
def update(*args):
    if ( (len(smiles_field.v_model) >= 1) + (len(family_field.v_model) >=1) ) == 2:
        predict_btn.disabled = False
    else:
        predict_btn.disabled = True
        
def predict(item, *args):
    
    predict_btn.loading = True
    
    user_smiles = smiles_field.v_model
    user_family = family_field.v_model[0]
    
    if type(user_smiles) == list:
        user_smiles = user_smiles[0]
        
    nbp = nbp_calc(smiles=user_smiles, family=user_family)
    
    if nbp == -1000:
        results.children = ["SMILES Parsing Error! Please enter a valid SMILES value."]
    else:
        results.children = [f'Normal Boiling Point = {nbp-273.15 :.2f} degC']
        
    predict_btn.loading = False

smiles_field = v.Combobox(label='Smiles', items=df_smiles[['smiles']].to_numpy().tolist(), v_model=[], class_='mx-4', dark=True, hide_details=True, hide_no_data=True, counter_value=5)
family_field = v.Autocomplete(label='Family', items=df_family[['Family']].to_numpy().tolist(), v_model=[], class_='mx-4', dark=True, hide_no_data=True)
predict_btn = v.Btn(children=['Predict'], color='orange lighten-1', dark=True, disabled=True)
results = v.Text(children=[''], class_='align-right mt-2')

smiles_field.on_event('change', guess_family)
family_field.on_event('change', update)
predict_btn.on_event('click', predict)

selection_row = v.Row(children=[smiles_field, family_field])
action_row = v.Row(children=[predict_btn, results], class_='mx-1 pb-2 justify-space-between')

title = v.CardTitle(children=['Thermo-PINN'])
app_card = v.Card(children=[title, selection_row, action_row], class_='pa-4 ma-4', dark=True, style_='max-width:600px')

app = v.App(children=[app_card])
app

App(children=[Card(children=[CardTitle(children=['Thermo-PINN']), Row(children=[Combobox(class_='mx-4', dark=T…