# Inference using the MolFormer Model

In this notebook, we show how to perform inference using GT4SD and finetuned variants of the MolFormer model. The current existing models have been trained based on the datasets provided by the [official MolFormer repository]https://github.com/IBM/molformer).

### Models for regression

This method can be used for any regression task.

In [1]:
from gt4sd.properties.molecules import MOLECULE_PROPERTY_PREDICTOR_FACTORY

property_class, parameters_class = MOLECULE_PROPERTY_PREDICTOR_FACTORY["molformer_regression"]
model = property_class(parameters_class(algorithm_version="molformer_alpha_public_test"))

model(input=["OC12COC3=NCC1C23"])

2023-03-09 09:46:09.182229: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  Referenced from: '/Users/dic/opt/miniconda3/envs/gt4sd_test/lib/python3.8/site-packages/torchvision/image.so'
  Expected in: '/Users/dic/opt/miniconda3/envs/gt4sd_test/lib/python3.8/site-packages/torch/lib/libtorch_cpu.dylib'
  warn(f"Failed to load image Python extension: {e}")
  _mcf.append(_pains, sort=True)['smarts'].values]
clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'


INFO:tape.models.modeling_utils:Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
09:46:26   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
INFO:toxsmi.utils.wrappers:Class weights are (1, 1).
09:46:26   Class weights are (1, 1).
INFO:toxsmi.utils.wrappers:Class weights are (1, 1).
09:46:26   Class weights are (1, 1).




INFO:gt4sd.algorithms.core:runnning MolformerRegression with configuration=ConfigurablePropertyAlgorithmConfiguration(algorithm_version='molformer_alpha_public_test')
09:46:28   runnning MolformerRegression with configuration=ConfigurablePropertyAlgorithmConfiguration(algorithm_version='molformer_alpha_public_test')
INFO:gt4sd.algorithms.core:ensure artifacts for the application are present.
09:46:28   ensure artifacts for the application are present.
INFO:gt4sd.s3:starting syncing
09:46:28   starting syncing
INFO:gt4sd.s3:syncing complete
09:46:28   syncing complete
INFO:gt4sd.s3:starting syncing
09:46:28   starting syncing
INFO:gt4sd.s3:syncing complete
09:46:28   syncing complete
09:46:29   Apex is not installed. Molformer's training is not supported. Install Apex from source to enable training.
INFO:gt4sd_molformer.finetune.ft_rotate_attention.ft_attention_layer:Using Rotation Embedding
09:46:29   Using Rotation Embedding
INFO:gt4sd_molformer.finetune.ft_rotate_attention.ft_attenti

[69.26847839355469]

### Models for classification

This method can be used for any binary classification task.

In [2]:
from gt4sd.properties.molecules import MOLECULE_PROPERTY_PREDICTOR_FACTORY

property_class, parameters_class = MOLECULE_PROPERTY_PREDICTOR_FACTORY["molformer_classification"]
model = property_class(parameters_class(algorithm_version="molformer_bace_public_test"))

model(input=["OC12COC3=NCC1C23"])

INFO:gt4sd.algorithms.core:runnning MolformerClassification with configuration=ConfigurablePropertyAlgorithmConfiguration(algorithm_version='molformer_bace_public_test')
09:46:32   runnning MolformerClassification with configuration=ConfigurablePropertyAlgorithmConfiguration(algorithm_version='molformer_bace_public_test')
INFO:gt4sd.algorithms.core:ensure artifacts for the application are present.
09:46:32   ensure artifacts for the application are present.
INFO:gt4sd.s3:starting syncing
09:46:32   starting syncing
INFO:gt4sd.s3:syncing complete
09:46:32   syncing complete
INFO:gt4sd.s3:starting syncing
09:46:32   starting syncing
INFO:gt4sd.s3:syncing complete
09:46:32   syncing complete
09:46:32   Apex is not installed. Molformer's training is not supported. Install Apex from source to enable training.
INFO:gt4sd_molformer.finetune.ft_rotate_attention.ft_attention_layer:Using Rotation Embedding
09:46:32   Using Rotation Embedding
INFO:gt4sd_molformer.finetune.ft_rotate_attention.ft_a

[1]

### Molformer for multiclass classification

This method can be used for any multiclass classification task.

In [3]:
property_class, parameters_class = MOLECULE_PROPERTY_PREDICTOR_FACTORY["molformer_multitask_classification"]
model = property_class(parameters_class(algorithm_version="molformer_clintox_test"))

print(model(["Ic1cc(ccc1)C[NH2+]C[C@@H](O)[C@@H](NC(=O)c1cc(cc(c1)C)C(=O)N(CCC)CCC)Cc1cc(F)cc(F)c1"]))

INFO:gt4sd.algorithms.core:runnning MolformerMultitaskClassification with configuration=ConfigurablePropertyAlgorithmConfiguration(algorithm_version='molformer_clintox_test')
09:46:34   runnning MolformerMultitaskClassification with configuration=ConfigurablePropertyAlgorithmConfiguration(algorithm_version='molformer_clintox_test')
INFO:gt4sd.algorithms.core:ensure artifacts for the application are present.
09:46:34   ensure artifacts for the application are present.
INFO:gt4sd.s3:starting syncing
09:46:34   starting syncing
INFO:gt4sd.s3:syncing complete
09:46:34   syncing complete
INFO:gt4sd.s3:starting syncing
09:46:34   starting syncing
INFO:gt4sd.s3:syncing complete
09:46:34   syncing complete
09:46:34   Apex is not installed. Molformer's training is not supported. Install Apex from source to enable training.
INFO:gt4sd_molformer.finetune.ft_rotate_attention.ft_attention_layer:Using Rotation Embedding
09:46:34   Using Rotation Embedding
INFO:gt4sd_molformer.finetune.ft_rotate_atte