<a href="https://colab.research.google.com/github/yifan950/Sublimation_enthalpy_model/blob/main/predict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



This notebook imports models published in the article titled "Accelerated predictions of the sublimation enthalpy of organic materials with machine learning" and offers predictions for the sublimation enthalpy of new candidates. For more detailed information, please refer to XXX.

In [1]:
#@title Predict the Sublimation Enthalpy of Organic Molecules
#@markdown *Please first input your candidate's SMILES string, and then press on the left button to run.*

#@markdown *This model will check and load the RDkit 2022.9.05 version, then give a prediction for the input SMILES.*

#Clone the repository if not already present
!git clone https://github.com/yifan950/Sublimation_enthalpy_model.git

#Load rdkit 2022.9.05 version
!pip install rdkit==2022.9.05

#Load necessary packages
import joblib
import os
import pickle
import numpy as np
import pandas as pd
from rdkit import Chem
from rdkit.Chem import Descriptors
from sklearn.preprocessing import StandardScaler
import warnings

# Suppress InconsistentVersionWarning
warnings.filterwarnings("ignore", category=UserWarning)

#Paths to the required files
scaler_path = "/content/Sublimation_enthalpy_model/845scaler.save"
model_path = "/content/Sublimation_enthalpy_model/845model.pkl"

#Load the scaler
with open(scaler_path, 'rb') as f:
    scaler = joblib.load(f)

#Load the model
with open(model_path, 'rb') as f:
    model = pickle.load(f)

# @markdown 1. Enter a SMILES string:
smiles = "CCCC" # @param {type:"string"}

def compute_descriptors(smiles):
    """
    Compute RDKit molecular descriptors for a given SMILES string.
    Parameters:
        smiles (str): The SMILES string of the molecule.
    Returns:
        np.array: A NumPy array of molecular descriptors.
    """
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        raise ValueError(f"Invalid SMILES string: {smiles}")

    # Extract descriptors
    descriptor_values = [func(mol) for _, func in Descriptors.descList]
    return np.array(descriptor_values)

def get_smiles_input():
    """Request user input for a SMILES string."""
    return input("Enter a SMILES string: ")

def predict_sublimation_enthalpy(smiles):
    """Predict the sublimation enthalpy for a given SMILES string."""
    try:
        # Compute molecular descriptors
        descriptors = compute_descriptors(smiles)

        # Reshape and normalize the descriptors using the scaler
        descriptors_normalized = scaler.transform([descriptors])

        # Use the model to predict the sublimation enthalpy
        prediction = model.predict(descriptors_normalized)
        return prediction[0]
    except Exception as e:
        print(f"Error: {e}")
        return None


if __name__ == "__main__":

    # Predict the sublimation enthalpy
    enthalpy = predict_sublimation_enthalpy(smiles)

    # Display the result
    if enthalpy is not None:
        print(f"Predicted Sublimation Enthalpy: {enthalpy:.2f} kJ/mol")
    else:
        print("Failed to predict sublimation enthalpy.")

Cloning into 'Sublimation_enthalpy_model'...
remote: Enumerating objects: 122, done.[K
remote: Counting objects: 100% (122/122), done.[K
remote: Compressing objects: 100% (118/118), done.[K
remote: Total 122 (delta 53), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (122/122), 6.66 MiB | 9.35 MiB/s, done.
Resolving deltas: 100% (53/53), done.
Collecting rdkit==2022.9.05
  Downloading rdkit-2022.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.9 kB)
Downloading rdkit-2022.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.4/29.4 MB[0m [31m35.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rdkit
Successfully installed rdkit-2022.9.5
Collecting scikit-learn==1.0
  Downloading scikit-learn-1.0.tar.gz (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m44.3 MB/s[0m eta [36m0:00:00[0m
[?2

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


Predicted Sublimation Enthalpy: 38.95 kJ/mol


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
