# SMILES Molecule Generation and Analysis

This notebook provides tools for generating new molecular structures from input SMILES (Simplified Molecular Input Line Entry System) strings using the REINVENT framework, and for analyzing these molecules based on various chemical properties using RDKit. The results are ranked to help identify promising drug candidates based on drug-likeness and other ADMET properties.


# Molecule Generation
Run the cell below and you will be prompted to enter a SMILES string using the text field below. Upon clicking the "Generate Molecules" button, the REINVENT framework will generate new molecular structures as SMILES strings and save them in a file called sampling.csv. This process integrates deep learning models to predict molecular properties and generate novel compounds based on the input structure. 

In [3]:
import ipywidgets as widgets
from IPython.display import display

# Function to update the SMILES file with user input
def update_smiles_file(smiles):
    with open('mol2mol.smi', 'w') as file:
        file.write(smiles + '\t' + 'UserInput')

# Widget setup
smiles_input = widgets.Text(
    value='',
    placeholder='Type your SMILES here',
    description='SMILES:',
    disabled=False
)
button = widgets.Button(description="Generate Molecules")
output = widgets.Output()

display(smiles_input, button, output)

# Button click event handler
def on_button_clicked(b):
    with output:
        print("Generating molecules for SMILES: ", smiles_input.value)
        # Update the SMILES file with the user's input
        update_smiles_file(smiles_input.value)
        # Directly run REINVENT from the Jupyter notebook
        print("Running REINVENT...")
        !reinvent mol2mol.toml

button.on_click(on_button_clicked)

Text(value='', description='SMILES:', placeholder='Type your SMILES here')

Button(description='Generate Molecules', style=ButtonStyle())

Output()

## Property Calculation

After generating the molecules, the notebook calculates various molecular properties using RDKit. These properties include molecular weight, logP, the number of hydrogen bond donors and acceptors, and topological polar surface area (TPSA). This data is crucial for assessing the drug-likeness of the molecules.
## Dynamic Visualization

The properties of the generated molecules are displayed below in a dynamic table. You can select which properties to display from the dropdown menu by holding control and clicking on the properties you want displayed. This interactivity allows for a tailored analysis according to specific needs or interests. The results are also saved as a CSV file called smiles_characteristics.csv.

In [2]:
import pandas as pd
from rdkit import Chem
from rdkit.Chem import Descriptors
import ipywidgets as widgets
from IPython.display import display, clear_output

# Load the CSV file and calculate properties
def load_and_compute_properties(csv_file):
    df = pd.read_csv(csv_file)
    properties = {
        'MolWt': Descriptors.MolWt,
        'LogP': Descriptors.MolLogP,
        'NumHDonors': Descriptors.NumHDonors,
        'NumHAcceptors': Descriptors.NumHAcceptors
    }
    
    for prop in properties.keys():
        df[prop] = None

    for index, row in df.iterrows():
        mol = Chem.MolFromSmiles(row['SMILES'])
        if mol:
            for prop, func in properties.items():
                df.at[index, prop] = func(mol)
    
    return df

# Function to display DataFrame in a dynamic table
def display_properties_table(dataframe):
    # Create a multi-select widget for selecting properties to display
    options = list(dataframe.columns)
    select_widget = widgets.SelectMultiple(
        options=options,
        value=['SMILES', 'MolWt', 'LogP', 'NumHDonors', 'NumHAcceptors'],
        description='Columns',
        disabled=False
    )

    output_area = widgets.Output()
    display(select_widget, output_area)

    #Write results to CSV file
    dataframe.to_csv('smiles_characteristics.csv', index=False)

    def update_table(change):
        with output_area:
            clear_output(wait=True)  # Clear the previous output
            display(dataframe[list(select_widget.value)])  # Display the new table based on selection

    select_widget.observe(update_table, names='value')
    update_table(None)  # Initialize view

# Example usage
df_properties = load_and_compute_properties('sampling.csv')
display_properties_table(df_properties)


SelectMultiple(description='Columns', index=(0, 4, 5, 6, 7), options=('SMILES', 'Input_SMILES', 'Tanimoto', 'N…

Output()

## Ranking and Results Download

The molecules are ranked based on their computed properties, and their scores are calculated to reflect their potential as drug candidates. The table below displays the top-ranked moleculesThe results are also saved as a CSV file called ranked_smiles.csvle for further analysis or reporting.

**Note**: The 'Score' column is a summation of the individual property values and serves as a simplified metric to gauge the overall promise of each molecule as a drug candidate.


In [3]:
from rdkit.Chem import Crippen, QED, rdMolDescriptors
import numpy as np

def calculate_properties(df):
    df['Mol'] = df['SMILES'].apply(Chem.MolFromSmiles)
    df['MolWt'] = df['Mol'].apply(Descriptors.MolWt)
    df['LogP'] = df['Mol'].apply(Crippen.MolLogP)
    df['NumHDonors'] = df['Mol'].apply(Descriptors.NumHDonors)
    df['NumHAcceptors'] = df['Mol'].apply(Descriptors.NumHAcceptors)
    df['TPSA'] = df['Mol'].apply(rdMolDescriptors.CalcTPSA)
    df['QED'] = df['Mol'].apply(QED.qed)
    
    # Lipinski's Rule of Five
    df['Ro5'] = ((df['MolWt'] <= 500) & 
                 (df['LogP'] <= 5) & 
                 (df['NumHDonors'] <= 5) & 
                 (df['NumHAcceptors'] <= 10)).astype(int)
    
    # Simple ranking score (the more rules met, the better)
    df['Score'] = df[['MolWt', 'LogP', 'NumHDonors', 'NumHAcceptors', 'TPSA', 'QED', 'Ro5']].sum(axis=1)
    return df

# Load and calculate properties
df_properties = load_and_compute_properties('sampling.csv')
df_ranked = calculate_properties(df_properties)

# Sort by score (high score = more promising candidate)
df_ranked.sort_values(by='Score', ascending=False, inplace=True)

def display_ranked_properties(df):
    select_widget = widgets.SelectMultiple(
        options=list(df.columns),
        value=['SMILES', 'MolWt', 'LogP', 'TPSA', 'QED', 'Ro5', 'Score'],
        description='Columns',
        disabled=False
    )

    output_area = widgets.Output()
    display(select_widget, output_area)

    # Save the complete DataFrame to a CSV file
    df.to_csv('ranked_smiles.csv', index=False)

    def update_table(change):
        with output_area:
            clear_output(wait=True)
            display(df[list(select_widget.value)].head(10))  # Display top 10 molecules

    select_widget.observe(update_table, names='value')
    update_table(None)  # Initialize view

display_ranked_properties(df_ranked)


SelectMultiple(description='Columns', index=(0, 4, 5, 9, 10, 11, 12), options=('SMILES', 'Input_SMILES', 'Tani…

Output()

## Interactive 3D Visualization of Molecules

This section of the notebook allows you to interactively visualize the molecular structures generated from SMILES strings. Select a molecule from the dropdown menu to view its 3D structure rendered below. This feature uses Py3Dmol, a powerful library for visualizing molecular structures in three dimensions directly within the Jupyter notebook.

The dropdown menu is populated with indices and corresponding SMILES strings for ease of selection. Upon selecting a different molecule from the dropdown, the 3D visualization will update to show the new molecule. This allows for an intuitive and engaging way to explore the spatial arrangement of atoms and the overall geometry of the molecules, which is crucial for understanding their chemical properties and interactions.

**Instructions:**
1. Use the dropdown menu to select a molecule. The index of each molecule is shown for easy reference.
2. The 3D structure of the selected molecule will be displayed below. You can rotate and zoom the structure interactively.
3. To view a different molecule, simply select another option from the dropdown menu.


In [10]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import py3Dmol

def setup_viewer():
    viewer = py3Dmol.view(width=400, height=300)
    viewer.setBackgroundColor('white')
    return viewer

viewer = setup_viewer()

def show_molecule(mol_smiles, viewer):
    if mol_smiles:
        mol = Chem.MolFromSmiles(mol_smiles)
        mol = Chem.AddHs(mol)  # Add hydrogens
        AllChem.EmbedMolecule(mol, randomSeed=42)  # Compute 3D coordinates
        mb = Chem.MolToMolBlock(mol)
        viewer.removeAllModels()  # Remove previous models
        viewer.addModel(mb, 'mol')
        viewer.setStyle({'stick': {}})
        viewer.zoomTo()
        viewer.show()

# Create a dropdown to select a molecule by index and SMILES
options = [(f'Index {idx}: {smiles}', smiles) for idx, smiles in enumerate(df_ranked['SMILES'])]
smiles_dropdown = widgets.Dropdown(
    options=options,
    value=options[0][1],
    description='Select SMILES:',
    disabled=False,
)

output_area = widgets.Output()

display(smiles_dropdown, output_area)

def on_smiles_change(change):
    with output_area:
        clear_output(wait=True)  # Clear previous outputs to avoid display clutter
        show_molecule(change.new, viewer)

# Observe changes in the dropdown
smiles_dropdown.observe(on_smiles_change, names='value')

# Display the first molecule initially
with output_area:
    show_molecule(smiles_dropdown.value, viewer)


Dropdown(description='Select SMILES:', options=(('Index 0: O=C(Oc1ccccc1C(=O)O)c1cccc([N+](=O)[O-])c1', 'O=C(O…

Output()

## Conclusion

This notebook facilitates the exploration of new molecular entities by providing a streamlined computational workflow that integrates generation, property calculation, and dynamic analysis. By leveraging computational chemistry and machine learning, it assists researchers in identifying novel compounds with potential therapeutic benefits.
