## **Finding the Lowest energy Tautomer(s)**
Here we show the use of [**Auto3D package**](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.2c00817) with the **ANI2xt neural network potential** to generate low energy tautomers of 140  organic molecules in a short time (**less than 3 minutes on T4 GPU**). Auto3D provides a streamlined pipeline for enumerating tautomers, generating 3D conformers, and identifying low-energy tautomeric forms for each molecule.

The dataset used in this tutorial is based on the [**Nicklaus Tautomer Database**](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b01156), which includes SMILES and experimental data for 2819 molecules, each with multiple tautomeric forms. We selected only molecules in the gas phase, as Auto3D performs geometry optimization in gas-phase conditions.


In [None]:
import sys,subprocess

In_Colab = 'google.colab' in sys.modules

if In_Colab:
  print("Running in Google Colab")

  from IPython.utils import io

  print("installing required packages...")

  with io.capture_output():
      subprocess.run(
          ["pip", "install", 'auto3d', 'torchani','rdkit','mols2grid']
      )

  print("Installation completed")

In [None]:
from rdkit import Chem
from Auto3D.auto3D import options
from Auto3D.tautomer import get_stable_tautomers
from rdkit.Chem import PandasTools
import mols2grid
import pandas as pd

First get the the molecules data (Nicklaus Tautomer Database):

In [None]:
all_tauts = pd.read_csv('https://raw.githubusercontent.com/SafiehLadani/Generative/main/data/all_tautomers.csv')

In [None]:
all_tauts.shape

So original dataset has 2819 molecules. Each tautomeric entry has been annotated with experimental conditions, bibliographic details, structural identifiers, and chemical information (e.g., SMILES, molecular weight)

In [None]:
all_tauts.head()

As we are only interested in molecules in gas phase:

In [None]:
tauts_gas_state = all_tauts.query('Solvent == "Gas phase"')
tauts_gas_state.shape

Out of the original 2819 molecules, only 145 were studied in the gas phase. Although each molecule includes between 2 to 5 tautomeric SMILES, we extract only the first tautomer each entry (SMILES_1) for analysis with Auto3D

In [None]:
for col in tauts_gas_state.columns:
  if col.startswith('SMILES'):
    print(col)

In [None]:
df = tauts_gas_state[['SMILES_1']].reset_index()
df.head()

We keep the original index  for each molecule by renaming the index column to original_ID, allowing easy reference back to the original entry in the Nicklaus Tautomer Database.

In [None]:
df = df.rename(columns={'index': 'original_ID'})[['SMILES_1','original_ID']]

In [None]:
df.head()

Since the **ANI2xt** model in Auto3D only supports molecules containing the elements H, C, N, O, F, Cl, and S, we filter out any SMILES that include unsupported elements. The function **is_allowed** checks each atom in the molecule and returns False if any element falls outside the allowed set

In [None]:
allowed_elements = {'H', 'C', 'N', 'O', 'F', 'Cl', 'S'}

def is_allowed(smiles):
    mol = Chem.MolFromSmiles(smiles)
    for atom in mol.GetAtoms():
        if atom.GetSymbol() not in allowed_elements:
            return False
    return True

df_final = df[df['SMILES_1'].apply(is_allowed)].reset_index(drop=True)
df_final.shape


After filtering, we have 140 molecules  that are compatible with ANI2xt.

We save these molecules in a **.smi** file to use as input for Auto3D.


In [None]:
input_file = 'molecules.smi'
with open(input_file, 'w') as f:
    for index, row in df_final.iterrows():
        f.write(f"{row['SMILES_1']}\t{row['original_ID']}\n")

### **Running the Auto3D**
The **options function** in Auto3D is used to set the parameters for controlling the 3D structure generation and optimization workflow.

**The steps to find the  low-energy tautomer(s) by Auto3D:**

1. Enumerating reasonable tautomers for each input SMILES (enumerate_tautomer=True)

2. Getting top **k conformers** for each tautomer by grouping all conformers of all tautomers of the same SMILES together
3. selecting the **top tauto_k conformers** as the final stable tautomer 3D structures for the input SMILE
The tautomers are ranked based on their conformer energies and therfore the 1st step is to generate conformers for all tautomers.

The conformer generation step sets **max_confs=10** and **patience=200**. Because here we care more about the relative stabilities of tautomers, 10 conformers from each tautomer would be a good representation whereas maintaining high efficiency.

Other parameters in options function:

**enumerate_tautomer:** If True, Auto3D will generate possible tautomeric forms of the input molecules.

**tauto_engine:** Specifies the engine used for tautomer enumeration (e.g., "**rdkit**").

**optimizing_engine:** Defines the neural network potential used for geometry optimization (**ANI2xt** here).


In [None]:
args = options(input_file, k=1, enumerate_tautomer=True, tauto_engine="rdkit",
                optimizing_engine="ANI2xt",
                max_confs=5, patience=200, use_gpu=True)


**get_stable_tautomer** function directly accepts the arguments from the options function and returns the **tatuo_k** tautomers with lowest energies. (this takes less than 3 minutes on T4 GPU)


In [None]:
tautomer_out = get_stable_tautomers(args, tauto_k=3)

We can now convert the tautomer_out results into a DataFrame for easier analysis and visualization

In [None]:
output_df = PandasTools.LoadSDF(tautomer_out)

In [None]:
output_df.head()

**mols2grid library** can create a grid of molecule images, making it easier to explore chemical structures along with associated data.

The **mol_col**='ROMol' argument specifies which column in the DataFrame contains RDKit molecule objects to render.

The **subset parameter** selects the columns to display alongside each moleculeâ€”in this case, the **molecule's ID** and its **relative tautomer energy** in kcal/mol.

The **transform parameter** is used to format the energy values using the custom function (**two_decimal**), which lrounds the number to two decimal places for cleaner visualization.

In [None]:
two_decimal = lambda x: round(float(x), 2)

In [None]:
mols2grid.display(output_df, mol_col='ROMol', subset=['ID',"E_tautomer_relative(kcal/mol)"], transform = {"E_tautomer_relative(kcal/mol)": two_decimal})

 The checkboxes above each molecule allow for interactive selection, which can be useful for further filtering or exporting selected compounds

**Acknowledgment**

Thanks to [Pat Walters](https://github.com/PatWalters) for providing the  original tutorial on using Auto3D