<a href="https://colab.research.google.com/github/HFooladi/GNNs-For-Chemists/blob/main/notebooks/01.1_GNN_3D_representation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 3D Molecular Representation for Graph Neural Networks: Tutorial

## Table of Contents
1. [Setup and Installation](#setup-and-installation)
2. [Introduction to 3D Molecular Representation](#introduction)
3. [From 2D to 3D: Understanding Conformers](#from-2d-to-3d-understanding-conformers)
4. [Generating 3D Molecular Conformations](#generating-3d-molecular-conformations)
5. [3D Graph Construction from Conformers](#3d-graph-construction-from-conformers)
6. [Distance-Based Edge Features](#distance-based-edge-features)
7. [3D Visualization of Molecular Graphs](#3d-visualization-of-molecular-graphs)
8. [Comparing 2D vs 3D Representations](#comparing-2d-vs-3d-representations)
9. [Advanced 3D Features](#advanced-3d-features)
10. [Conclusion](#conclusion)

## 1. Setup and Installation <a name="setup-and-installation"></a>

Building on the previous tutorial, we'll now explore 3D molecular representations. We'll need additional libraries for 3D visualization and conformer generation:
- **Plotly**: For interactive 3D visualizations
- **Py3Dmol**: For molecular 3D visualization
- **Additional RDKit functions**: For conformer generation

In [None]:
#@title install required libraries
!pip install -q rdkit
!pip install -q torch_geometric
!pip install -q plotly
!pip install -q py3dmol

Now let's import the libraries we'll need throughout this tutorial:

In [None]:
#@title Import required libraries
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D

# RDKit for molecular handling
import rdkit
from rdkit import Chem
from rdkit.Chem import Draw, AllChem, Descriptors
from rdkit.Chem.rdMolAlign import AlignMol
from rdkit.Chem.rdMolDescriptors import CalcMolFormula

# PyTorch and PyTorch Geometric
import torch
from torch_geometric.data import Data
from torch_geometric.datasets import MoleculeNet
from torch_geometric.utils import to_networkx

# 3D visualization libraries
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
try:
    import py3Dmol
    PY3DMOL_AVAILABLE = True
except ImportError:
    PY3DMOL_AVAILABLE = False
    print("py3Dmol not available - some 3D visualizations will be skipped")

# NetworkX for graph visualization
import networkx as nx

# Set plotting style
sns.set_context("notebook", font_scale=1.2)
sns.set_palette("Set2")

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

## 2. Introduction to 3D Molecular Representation <a name="introduction"></a>

In the previous tutorial, we learned how to represent molecules as 2D graphs based on their connectivity (which atoms are bonded to which). However, **chemistry happens in 3D space**. The actual shape and spatial arrangement of atoms in a molecule profoundly affects its properties:

- **Drug-receptor interactions** depend on the 3D shape complementarity
- **Catalytic activity** is determined by the spatial arrangement of active sites
- **Molecular properties** like boiling point and solubility are influenced by molecular volume and surface area
- **Stereochemistry** cannot be captured without 3D information

### Why 3D Matters in Chemistry

Consider these examples where 3D structure is crucial:
1. **Enantiomers**: Mirror-image molecules with identical connectivity but different biological activities
2. **Conformational flexibility**: The same molecule can adopt different 3D shapes
3. **Steric hindrance**: Bulky groups preventing reactions due to spatial constraints
4. **Binding pockets**: Drug molecules must fit precisely into protein binding sites

### Learning Objectives

By the end of this tutorial, you will be able to:
- **Generate** 3D molecular conformations from SMILES strings
- **Construct** distance-based molecular graphs incorporating spatial information
- **Visualize** molecules in 3D space with their corresponding graph representations
- **Compare** 2D topology-based vs 3D geometry-based molecular graphs
- **Understand** how 3D features enhance molecular property prediction

## 3. From 2D to 3D: Understanding Conformers <a name="from-2d-to-3d-understanding-conformers"></a>

### What is a Molecular Conformer?

A **conformer** (or conformation) is a specific 3D arrangement of atoms in a molecule that can be achieved by rotation around single bonds without breaking any covalent bonds. Unlike 2D structural representations that only show connectivity, conformers capture the actual spatial positions of atoms.

Key concepts:
- **Constitutional isomers**: Different connectivity (different molecules)
- **Conformers**: Same connectivity, different 3D arrangements (same molecule, different shapes)  
- **Conformational energy**: Energy required to adopt a specific shape
- **Preferred conformations**: Low-energy, stable 3D arrangements

Let's start with a simple example to understand the difference between 2D connectivity and 3D geometry:

In [None]:
def demonstrate_2d_vs_3d_concept():
    """
    Demonstrate the difference between 2D connectivity and 3D spatial arrangement.
    """
    # Create a simple molecule: butane (C4H10)
    butane_smiles = "CCCC"
    mol = Chem.MolFromSmiles(butane_smiles)
    mol = Chem.AddHs(mol)  # Add hydrogens explicitly
    
    print("Butane (C4H10) - Same connectivity, different 3D shapes")
    print("SMILES:", butane_smiles)
    print("Molecular formula:", CalcMolFormula(mol))
    print("Number of atoms:", mol.GetNumAtoms())
    print("Number of rotatable bonds:", Descriptors.NumRotatableBonds(mol))
    print("\nThis molecule can adopt multiple 3D conformations by rotating around C-C bonds")
    
    # Display 2D structure
    plt.figure(figsize=(10, 4))
    plt.subplot(1, 2, 1)
    img = Draw.MolToImage(mol, size=(300, 200))
    plt.imshow(img)
    plt.axis('off')
    plt.title('2D Structure\n(Shows Connectivity Only)')
    
    plt.subplot(1, 2, 2)
    plt.text(0.1, 0.7, '3D Conformations:', fontsize=14, weight='bold')
    plt.text(0.1, 0.6, '• Extended (anti) conformation', fontsize=12)
    plt.text(0.1, 0.5, '• Gauche conformation', fontsize=12)  
    plt.text(0.1, 0.4, '• Folded conformations', fontsize=12)
    plt.text(0.1, 0.3, '• And many others...', fontsize=12)
    plt.text(0.1, 0.1, 'Each has different:\n- Energy\n- Shape\n- Properties', fontsize=10, style='italic')
    plt.xlim(0, 1)
    plt.ylim(0, 1)
    plt.axis('off')
    plt.title('Multiple 3D Shapes Possible')
    
    plt.tight_layout()
    plt.show()

demonstrate_2d_vs_3d_concept()

### The Conformational Landscape

Molecules exist in a **conformational landscape** - an energy surface where each point represents a different 3D arrangement. Understanding this landscape is crucial for:
- Drug design (active conformations)
- Reaction mechanisms (transition states)
- Material properties (packing arrangements)

In [None]:
def visualize_conformational_concept():
    """
    Create a conceptual diagram of conformational space.
    """
    # Create a sample energy landscape
    x = np.linspace(0, 360, 100)
    # Simplified rotational energy profile for butane
    energy = 3 * np.cos(np.radians(2*x)) + 1.5 * np.cos(np.radians(3*x)) + 5
    
    plt.figure(figsize=(12, 6))
    
    plt.subplot(1, 2, 1)
    plt.plot(x, energy, 'b-', linewidth=2)
    plt.xlabel('Dihedral Angle (degrees)')
    plt.ylabel('Relative Energy (kcal/mol)')
    plt.title('Conformational Energy Profile')
    plt.grid(True, alpha=0.3)
    
    # Mark stable conformations
    stable_angles = [0, 120, 240]
    stable_energies = [3 * np.cos(np.radians(2*angle)) + 1.5 * np.cos(np.radians(3*angle)) + 5 
                      for angle in stable_angles]
    plt.scatter(stable_angles, stable_energies, color='red', s=100, zorder=5)
    plt.annotate('Stable\nConformation', xy=(0, stable_energies[0]), 
                xytext=(50, stable_energies[0]+1),
                arrowprops=dict(arrowstyle='->', color='red'))
    
    plt.subplot(1, 2, 2)
    plt.text(0.1, 0.8, 'Key Concepts:', fontsize=14, weight='bold')
    plt.text(0.1, 0.7, '• Energy minima = stable conformers', fontsize=12)
    plt.text(0.1, 0.6, '• Energy barriers between conformers', fontsize=12)
    plt.text(0.1, 0.5, '• Room temperature accessible conformers', fontsize=12)
    plt.text(0.1, 0.4, '• Conformer populations follow Boltzmann distribution', fontsize=12)
    
    plt.text(0.1, 0.25, 'For GNNs:', fontsize=14, weight='bold', color='purple')
    plt.text(0.1, 0.15, '• Which conformer to use?', fontsize=12, color='purple')
    plt.text(0.1, 0.05, '• How to incorporate flexibility?', fontsize=12, color='purple')
    
    plt.xlim(0, 1)
    plt.ylim(0, 1)
    plt.axis('off')
    plt.title('Conformational Considerations for ML')
    
    plt.tight_layout()
    plt.show()

visualize_conformational_concept()

## 4. Generating 3D Molecular Conformations <a name="generating-3d-molecular-conformations"></a>

RDKit provides several methods for generating 3D conformations from SMILES strings. The most common approach uses the **ETKDG** (Experimental-Torsional Knowledge Distance Geometry) algorithm.

### Single Conformer Generation

Let's start by generating a single, optimized 3D conformer:

In [None]:
def generate_3d_conformer(smiles: str, optimize=True):
    """
    Generate a single 3D conformer from a SMILES string.
    
    Args:
        smiles (str): SMILES string of the molecule
        optimize (bool): Whether to optimize the conformer geometry
    
    Returns:
        rdkit.Chem.Mol: Molecule with 3D coordinates
    """
    # Step 1: Create molecule from SMILES
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        raise ValueError(f"Invalid SMILES string: {smiles}")
    
    # Step 2: Add hydrogens (essential for realistic 3D geometry)
    mol = Chem.AddHs(mol)
    
    # Step 3: Generate 3D coordinates using ETKDG
    # ETKDG = Experimental-Torsional Knowledge Distance Geometry
    # This method uses experimental torsional angle preferences
    params = AllChem.ETKDGv3()
    params.randomSeed = 42  # For reproducible results
    
    # Generate the conformer
    conf_id = AllChem.EmbedMolecule(mol, params)
    
    if conf_id == -1:
        raise RuntimeError(f"Could not generate 3D coordinates for {smiles}")
    
    # Step 4: Optimize geometry using MMFF force field (optional but recommended)
    if optimize:
        # MMFF94 is a molecular mechanics force field for geometry optimization
        AllChem.MMFFOptimizeMolecule(mol, confId=conf_id)
    
    return mol

# Test the function with several molecules
test_molecules = {
    "Methanol": "CO",
    "Ethanol": "CCO", 
    "Cyclohexane": "C1CCCCC1",
    "Benzene": "c1ccccc1",
    "Aspirin": "CC(=O)OC1=CC=CC=C1C(=O)O"
}

print("Generating 3D conformers for test molecules:")
print("=" * 50)

conformers = {}
for name, smiles in test_molecules.items():
    try:
        mol_3d = generate_3d_conformer(smiles)
        conformers[name] = mol_3d
        
        # Get some basic 3D properties
        conf = mol_3d.GetConformer()
        n_atoms = mol_3d.GetNumAtoms()
        
        # Calculate molecular volume (approximate)
        positions = []
        for i in range(n_atoms):
            pos = conf.GetAtomPosition(i)
            positions.append([pos.x, pos.y, pos.z])
        positions = np.array(positions)
        
        # Calculate bounding box volume as rough estimate
        ranges = np.ptp(positions, axis=0)  # peak-to-peak (max - min) for each dimension
        bbox_volume = np.prod(ranges)
        
        print(f"{name}:")
        print(f"  SMILES: {smiles}")
        print(f"  Atoms: {n_atoms}")
        print(f"  3D Bounding Box Volume: {bbox_volume:.2f} Ų")
        print(f"  Conformer generated successfully ✓")
        print()
        
    except Exception as e:
        print(f"{name}: Failed - {e}")
        print()

### Multiple Conformer Generation

For flexible molecules, it's often useful to generate multiple conformers to sample the conformational space:

In [None]:
def generate_multiple_conformers(smiles: str, n_conformers=10, optimize=True):
    """
    Generate multiple 3D conformers for a molecule.
    
    Args:
        smiles (str): SMILES string of the molecule
        n_conformers (int): Number of conformers to generate
        optimize (bool): Whether to optimize conformer geometries
    
    Returns:
        tuple: (mol with conformers, list of energies)
    """
    # Create molecule and add hydrogens
    mol = Chem.MolFromSmiles(smiles)
    mol = Chem.AddHs(mol)
    
    # Set up parameters for conformer generation
    params = AllChem.ETKDGv3()
    params.randomSeed = 42
    params.numThreads = 0  # Use all available cores
    
    # Generate multiple conformers
    conf_ids = AllChem.EmbedMultipleConfs(mol, numConfs=n_conformers, params=params)
    
    if not conf_ids:
        raise RuntimeError(f"Could not generate conformers for {smiles}")
    
    # Optimize each conformer and calculate energies
    energies = []
    if optimize:
        for conf_id in conf_ids:
            # Optimize with MMFF94 force field
            props = AllChem.MMFFGetMoleculeProperties(mol)
            ff = AllChem.MMFFGetMoleculeForceField(mol, props, confId=conf_id)
            if ff:
                ff.Minimize()
                energy = ff.CalcEnergy()
                energies.append(energy)
            else:
                energies.append(float('inf'))
    else:
        energies = [0.0] * len(conf_ids)
    
    return mol, energies

# Generate multiple conformers for a flexible molecule (butane)
butane_smiles = "CCCC"
print(f"Generating multiple conformers for butane ({butane_smiles}):")
print("=" * 50)

try:
    butane_conformers, energies = generate_multiple_conformers(butane_smiles, n_conformers=20)
    
    # Sort conformers by energy
    sorted_indices = np.argsort(energies)
    sorted_energies = np.array(energies)[sorted_indices]
    
    print(f"Generated {len(energies)} conformers")
    print(f"Energy range: {min(energies):.2f} to {max(energies):.2f} kcal/mol")
    print(f"Energy spread: {max(energies) - min(energies):.2f} kcal/mol")
    print()
    
    # Show top 5 lowest energy conformers
    print("Top 5 lowest energy conformers:")
    for i, idx in enumerate(sorted_indices[:5]):
        rel_energy = sorted_energies[i] - sorted_energies[0]  # Relative to lowest
        print(f"  Conformer {idx}: {energies[idx]:.2f} kcal/mol (ΔE = {rel_energy:.2f})")
    
    # Calculate Boltzmann populations at room temperature
    kT = 0.593  # kcal/mol at 298K
    exp_factors = np.exp(-(sorted_energies - sorted_energies[0]) / kT)
    populations = exp_factors / np.sum(exp_factors)
    
    print(f"\nBoltzmann populations at 298K (top 3):")
    for i in range(min(3, len(populations))):
        print(f"  Conformer {sorted_indices[i]}: {populations[i]*100:.1f}%")
        
except Exception as e:
    print(f"Error: {e}")

## 5. 3D Graph Construction from Conformers <a name="3d-graph-construction-from-conformers"></a>

Now that we can generate 3D conformers, let's create molecular graphs that incorporate spatial information. Unlike 2D graphs that only consider chemical bonds, 3D graphs can include:
1. **Covalent bonds** (traditional edges)
2. **Distance-based edges** (atoms within a certain distance)
3. **Spatial features** (coordinates, distances, angles)

### Extracting 3D Coordinates

First, let's create a function to extract 3D coordinates from conformers:

In [None]:
def extract_3d_coordinates(mol, conf_id=0):
    """
    Extract 3D coordinates from a molecular conformer.
    
    Args:
        mol: RDKit molecule with 3D coordinates
        conf_id: Conformer ID to use (default: 0)
    
    Returns:
        numpy.ndarray: Array of shape (n_atoms, 3) with x, y, z coordinates
    """
    conformer = mol.GetConformer(conf_id)
    coordinates = []
    
    for atom_idx in range(mol.GetNumAtoms()):
        pos = conformer.GetAtomPosition(atom_idx)
        coordinates.append([pos.x, pos.y, pos.z])
    
    return np.array(coordinates)

def get_atomic_features_3d(mol):
    """
    Extract atomic features including 3D-specific properties.
    
    Args:
        mol: RDKit molecule with 3D coordinates
    
    Returns:
        numpy.ndarray: Extended feature matrix including 3D features
    """
    coordinates = extract_3d_coordinates(mol)
    n_atoms = mol.GetNumAtoms()
    
    # Basic features (from previous tutorial)
    basic_features = []
    for atom in mol.GetAtoms():
        atom_type = atom.GetSymbol()
        atomic_num = atom.GetAtomicNum()
        formal_charge = atom.GetFormalCharge()
        hybridization = atom.GetHybridization()
        is_aromatic = int(atom.GetIsAromatic())
        is_in_ring = int(atom.IsInRing())
        
        # One-hot encoding for atom types
        atom_types = ['C', 'O', 'N', 'H', 'F', 'P', 'S', 'Cl', 'Br', 'I']
        atom_type_onehot = [1 if atom_type == t else 0 for t in atom_types]
        if atom_type not in atom_types:
            atom_type_onehot.append(1)  # "Other"
        else:
            atom_type_onehot.append(0)
        
        features = atom_type_onehot + [
            formal_charge,
            is_aromatic,
            is_in_ring,
            atom.GetDegree(),
            atom.GetTotalNumHs(),
            atom.GetNumRadicalElectrons()
        ]
        basic_features.append(features)
    
    basic_features = np.array(basic_features)
    
    # Add 3D coordinates as features
    enhanced_features = np.concatenate([basic_features, coordinates], axis=1)
    
    # Add distance-based features
    center_of_mass = np.mean(coordinates, axis=0)
    distances_to_com = np.linalg.norm(coordinates - center_of_mass, axis=1)
    
    # Add distances to center of mass as feature
    enhanced_features = np.concatenate([enhanced_features, distances_to_com.reshape(-1, 1)], axis=1)
    
    return enhanced_features, coordinates

# Test the coordinate extraction
print("Testing 3D coordinate extraction:")
print("=" * 40)

for name, mol in conformers.items():
    coords = extract_3d_coordinates(mol)
    features, _ = get_atomic_features_3d(mol)
    
    print(f"{name}:")
    print(f"  Shape of coordinates: {coords.shape}")
    print(f"  Shape of features: {features.shape}")
    print(f"  Coordinate range:")
    print(f"    X: {coords[:, 0].min():.2f} to {coords[:, 0].max():.2f}")
    print(f"    Y: {coords[:, 1].min():.2f} to {coords[:, 1].max():.2f}")
    print(f"    Z: {coords[:, 2].min():.2f} to {coords[:, 2].max():.2f}")
    print()

### Distance-Based Edge Construction

In 3D molecular graphs, we can define edges not just based on chemical bonds, but also based on spatial proximity. This captures important non-covalent interactions:

In [None]:
def create_3d_molecular_graph(mol, cutoff_distance=5.0, include_bond_edges=True):
    """
    Create a 3D molecular graph with distance-based edges.
    
    Args:
        mol: RDKit molecule with 3D coordinates
        cutoff_distance: Maximum distance for creating edges (Angstroms)
        include_bond_edges: Whether to include covalent bond edges
    
    Returns:
        tuple: (node_features, edge_indices, edge_features, coordinates)
    """
    # Get enhanced node features and coordinates
    node_features, coordinates = get_atomic_features_3d(mol)
    n_atoms = mol.GetNumAtoms()
    
    # Calculate all pairwise distances
    distance_matrix = np.zeros((n_atoms, n_atoms))
    for i in range(n_atoms):
        for j in range(n_atoms):
            if i != j:
                distance = np.linalg.norm(coordinates[i] - coordinates[j])
                distance_matrix[i, j] = distance
    
    # Create edge lists
    edge_indices = []
    edge_features = []
    
    # Add covalent bond edges (if requested)
    if include_bond_edges:
        for bond in mol.GetBonds():
            i = bond.GetBeginAtomIdx()
            j = bond.GetEndAtomIdx()
            distance = distance_matrix[i, j]
            
            # Edge features: [is_covalent, distance, 1/distance]
            edge_feature = [1.0, distance, 1.0/distance if distance > 0 else 0.0]
            
            # Add edge in both directions
            edge_indices.extend([(i, j), (j, i)])
            edge_features.extend([edge_feature, edge_feature])
    
    # Add distance-based edges
    for i in range(n_atoms):
        for j in range(i + 1, n_atoms):
            distance = distance_matrix[i, j]
            
            # Create edge if within cutoff and not already a covalent bond
            if distance <= cutoff_distance:
                # Check if this is already a covalent bond
                is_covalent_bond = False
                if include_bond_edges:
                    bond = mol.GetBondBetweenAtoms(i, j)
                    is_covalent_bond = bond is not None
                
                if not is_covalent_bond:
                    # Edge features: [is_covalent, distance, 1/distance]
                    edge_feature = [0.0, distance, 1.0/distance if distance > 0 else 0.0]
                    
                    # Add edge in both directions
                    edge_indices.extend([(i, j), (j, i)])
                    edge_features.extend([edge_feature, edge_feature])
    
    return node_features, edge_indices, edge_features, coordinates

# Test 3D graph construction
print("Testing 3D molecular graph construction:")
print("=" * 45)

# Test with different cutoff distances
cutoff_distances = [3.0, 4.0, 5.0]
test_mol = conformers["Cyclohexane"]

for cutoff in cutoff_distances:
    node_feat, edge_idx, edge_feat, coords = create_3d_molecular_graph(
        test_mol, cutoff_distance=cutoff
    )
    
    # Count different edge types
    edge_feat_array = np.array(edge_feat)
    covalent_edges = np.sum(edge_feat_array[:, 0] == 1.0) // 2  # Divide by 2 (bidirectional)
    distance_edges = np.sum(edge_feat_array[:, 0] == 0.0) // 2
    
    print(f"Cutoff {cutoff} Å:")
    print(f"  Total edges: {len(edge_idx)}")
    print(f"  Covalent edges: {covalent_edges}")
    print(f"  Distance-based edges: {distance_edges}")
    print(f"  Average edge distance: {np.mean(edge_feat_array[:, 1]):.2f} Å")
    print()

## 6. Distance-Based Edge Features <a name="distance-based-edge-features"></a>

3D molecular graphs can incorporate rich edge features based on spatial relationships:

In [None]:
def calculate_advanced_3d_features(mol, coordinates):
    """
    Calculate advanced 3D features for molecular graphs.
    """
    n_atoms = mol.GetNumAtoms()
    
    # Calculate molecular descriptors
    features = {
        'molecular_weight': Descriptors.MolWt(mol),
        'volume_estimate': np.prod(np.ptp(coordinates, axis=0)),  # Bounding box volume
        'surface_area_estimate': 0.0,  # Would need more complex calculation
        'radius_of_gyration': 0.0
    }
    
    # Calculate radius of gyration
    center_of_mass = np.mean(coordinates, axis=0)
    rg_squared = np.mean(np.sum((coordinates - center_of_mass)**2, axis=1))
    features['radius_of_gyration'] = np.sqrt(rg_squared)
    
    return features

def analyze_3d_graph_properties(mol, cutoff_distance=5.0):
    """
    Analyze properties of 3D molecular graph.
    """
    node_feat, edge_idx, edge_feat, coords = create_3d_molecular_graph(
        mol, cutoff_distance=cutoff_distance
    )
    
    # Basic graph statistics
    n_nodes = len(node_feat)
    n_edges = len(edge_idx) // 2  # Bidirectional edges
    
    # Edge statistics
    edge_feat_array = np.array(edge_feat)
    distances = edge_feat_array[:, 1]
    covalent_mask = edge_feat_array[:, 0] == 1.0
    
    covalent_distances = distances[covalent_mask]
    noncovalent_distances = distances[~covalent_mask]
    
    results = {
        'n_nodes': n_nodes,
        'n_edges': n_edges,
        'avg_covalent_distance': np.mean(covalent_distances) if len(covalent_distances) > 0 else 0,
        'avg_noncovalent_distance': np.mean(noncovalent_distances) if len(noncovalent_distances) > 0 else 0,
        'max_distance': np.max(distances),
        'min_distance': np.min(distances),
        'coordinates': coords
    }
    
    return results

# Analyze different molecules
print("3D Graph Analysis for Different Molecules:")
print("=" * 50)

for name, mol in conformers.items():
    print(f"\n{name}:")
    analysis = analyze_3d_graph_properties(mol, cutoff_distance=4.0)
    
    print(f"  Nodes: {analysis['n_nodes']}")
    print(f"  Edges: {analysis['n_edges']}")
    print(f"  Avg covalent distance: {analysis['avg_covalent_distance']:.2f} Å")
    print(f"  Avg non-covalent distance: {analysis['avg_noncovalent_distance']:.2f} Å")
    print(f"  Distance range: {analysis['min_distance']:.2f} - {analysis['max_distance']:.2f} Å")
    
    # Calculate 3D features
    features_3d = calculate_advanced_3d_features(mol, analysis['coordinates'])
    print(f"  Molecular weight: {features_3d['molecular_weight']:.1f} g/mol")
    print(f"  Volume estimate: {features_3d['volume_estimate']:.1f} Ų")
    print(f"  Radius of gyration: {features_3d['radius_of_gyration']:.2f} Å")

## 7. 3D Visualization of Molecular Graphs <a name="3d-visualization-of-molecular-graphs"></a>

Visualizing 3D molecular graphs helps us understand the spatial relationships between atoms:

In [None]:
def plot_3d_molecular_graph(mol, cutoff_distance=4.0, show_all_edges=False):
    """
    Create an interactive 3D visualization of a molecular graph.
    """
    # Get graph data
    node_feat, edge_idx, edge_feat, coords = create_3d_molecular_graph(
        mol, cutoff_distance=cutoff_distance
    )
    
    # Get atom symbols
    atom_symbols = [atom.GetSymbol() for atom in mol.GetAtoms()]
    
    # Color mapping for atoms
    atom_colors = {
        'H': 'lightgray', 'C': 'black', 'N': 'blue', 'O': 'red',
        'F': 'green', 'P': 'orange', 'S': 'yellow', 'Cl': 'green'
    }
    
    colors = [atom_colors.get(symbol, 'purple') for symbol in atom_symbols]
    
    # Create 3D scatter plot
    fig = go.Figure()
    
    # Add atoms as scatter points
    fig.add_trace(go.Scatter3d(
        x=coords[:, 0], y=coords[:, 1], z=coords[:, 2],
        mode='markers+text',
        marker=dict(size=8, color=colors, opacity=0.8),
        text=atom_symbols,
        textposition="middle center",
        name="Atoms"
    ))
    
    # Add edges
    edge_feat_array = np.array(edge_feat)
    edge_x, edge_y, edge_z = [], [], []
    
    for i, (start, end) in enumerate(edge_idx):
        if i % 2 == 0:  # Only draw each edge once
            edge_x.extend([coords[start, 0], coords[end, 0], None])
            edge_y.extend([coords[start, 1], coords[end, 1], None])
            edge_z.extend([coords[start, 2], coords[end, 2], None])
    
    # Add edges as lines
    fig.add_trace(go.Scatter3d(
        x=edge_x, y=edge_y, z=edge_z,
        mode='lines',
        line=dict(color='gray', width=2),
        name="Bonds/Interactions"
    ))
    
    # Update layout
    fig.update_layout(
        title=f"3D Molecular Graph (cutoff: {cutoff_distance} Å)",
        scene=dict(
            xaxis_title="X (Å)",
            yaxis_title="Y (Å)",
            zaxis_title="Z (Å)",
            aspectmode='cube'
        ),
        showlegend=True,
        width=800,
        height=600
    )
    
    return fig

# Create 3D visualizations for selected molecules
molecules_to_visualize = ["Cyclohexane", "Benzene"]

for name in molecules_to_visualize:
    if name in conformers:
        print(f"Creating 3D visualization for {name}...")
        fig = plot_3d_molecular_graph(conformers[name], cutoff_distance=4.0)
        fig.show()

### Comparing Different Cutoff Distances

Let's see how the graph structure changes with different distance cutoffs:

In [None]:
def compare_cutoff_distances(mol, cutoffs=[3.0, 4.0, 5.0], molecule_name="Molecule"):
    """
    Compare 3D molecular graphs with different cutoff distances.
    """
    fig = make_subplots(
        rows=1, cols=len(cutoffs),
        specs=[[{'type': 'scatter3d'} for _ in cutoffs]],
        subplot_titles=[f"Cutoff: {c} Å" for c in cutoffs]
    )
    
    atom_symbols = [atom.GetSymbol() for atom in mol.GetAtoms()]
    atom_colors = {
        'H': 'lightgray', 'C': 'black', 'N': 'blue', 'O': 'red',
        'F': 'green', 'P': 'orange', 'S': 'yellow', 'Cl': 'green'
    }
    colors = [atom_colors.get(symbol, 'purple') for symbol in atom_symbols]
    
    for col, cutoff in enumerate(cutoffs, 1):
        # Get graph data
        node_feat, edge_idx, edge_feat, coords = create_3d_molecular_graph(
            mol, cutoff_distance=cutoff
        )
        
        # Add atoms
        fig.add_trace(
            go.Scatter3d(
                x=coords[:, 0], y=coords[:, 1], z=coords[:, 2],
                mode='markers+text',
                marker=dict(size=6, color=colors, opacity=0.8),
                text=atom_symbols,
                textposition="middle center",
                name=f"Atoms ({cutoff}Å)",
                showlegend=(col == 1)
            ),
            row=1, col=col
        )
        
        # Add edges
        edge_x, edge_y, edge_z = [], [], []
        for i, (start, end) in enumerate(edge_idx):
            if i % 2 == 0:  # Only draw each edge once
                edge_x.extend([coords[start, 0], coords[end, 0], None])
                edge_y.extend([coords[start, 1], coords[end, 1], None])
                edge_z.extend([coords[start, 2], coords[end, 2], None])
        
        fig.add_trace(
            go.Scatter3d(
                x=edge_x, y=edge_y, z=edge_z,
                mode='lines',
                line=dict(color='gray', width=1),
                name=f"Edges ({cutoff}Å)",
                showlegend=(col == 1)
            ),
            row=1, col=col
        )
    
    fig.update_layout(
        title=f"{molecule_name}: Effect of Distance Cutoff",
        scene=dict(aspectmode='cube'),
        scene2=dict(aspectmode='cube'),
        scene3=dict(aspectmode='cube'),
        height=500
    )
    
    return fig

# Compare cutoff distances for cyclohexane
if "Cyclohexane" in conformers:
    fig = compare_cutoff_distances(conformers["Cyclohexane"], molecule_name="Cyclohexane")
    fig.show()

## 8. Comparing 2D vs 3D Representations <a name="comparing-2d-vs-3d-representations"></a>

Let's directly compare 2D connectivity-based graphs with 3D distance-based graphs:

In [None]:
def compare_2d_vs_3d_graphs(mol, cutoff_distance=4.0):
    """
    Compare 2D (connectivity-based) vs 3D (distance-based) molecular graphs.
    """
    # Get 2D graph (connectivity only)
    adjacency_2d = np.zeros((mol.GetNumAtoms(), mol.GetNumAtoms()))
    edges_2d = []
    
    for bond in mol.GetBonds():
        i, j = bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()
        adjacency_2d[i, j] = 1
        adjacency_2d[j, i] = 1
        edges_2d.extend([(i, j), (j, i)])
    
    # Get 3D graph
    node_feat_3d, edges_3d, edge_feat_3d, coords = create_3d_molecular_graph(
        mol, cutoff_distance=cutoff_distance
    )
    
    # Analysis
    n_atoms = mol.GetNumAtoms()
    n_edges_2d = len(edges_2d) // 2
    n_edges_3d = len(edges_3d) // 2
    
    # Count covalent vs non-covalent edges in 3D
    edge_feat_array = np.array(edge_feat_3d)
    covalent_edges_3d = np.sum(edge_feat_array[:, 0] == 1.0) // 2
    noncovalent_edges_3d = np.sum(edge_feat_array[:, 0] == 0.0) // 2
    
    print("2D vs 3D Graph Comparison:")
    print("=" * 30)
    print(f"Number of atoms: {n_atoms}")
    print(f"2D edges (bonds only): {n_edges_2d}")
    print(f"3D total edges: {n_edges_3d}")
    print(f"  - Covalent: {covalent_edges_3d}")
    print(f"  - Non-covalent: {noncovalent_edges_3d}")
    print(f"Edge ratio (3D/2D): {n_edges_3d/n_edges_2d:.2f}")
    
    return {
        '2d_edges': n_edges_2d,
        '3d_edges': n_edges_3d,
        'covalent_3d': covalent_edges_3d,
        'noncovalent_3d': noncovalent_edges_3d,
        'coordinates': coords
    }

# Compare for different molecules
print("Comparison Results for Different Molecules:")
print("=" * 50)

comparison_results = {}
for name, mol in conformers.items():
    print(f"\n{name}:")
    results = compare_2d_vs_3d_graphs(mol, cutoff_distance=4.0)
    comparison_results[name] = results

## 9. Advanced 3D Features <a name="advanced-3d-features"></a>

3D molecular representations can include sophisticated geometric features:

In [None]:
def calculate_geometric_features(mol, coordinates):
    """
    Calculate advanced geometric features for 3D molecular graphs.
    """
    n_atoms = len(coordinates)
    features = {}
    
    # 1. Molecular surface area (approximation using convex hull)
    try:
        from scipy.spatial import ConvexHull
        hull = ConvexHull(coordinates)
        features['convex_hull_volume'] = hull.volume
        features['convex_hull_area'] = hull.area
    except:
        features['convex_hull_volume'] = 0.0
        features['convex_hull_area'] = 0.0
    
    # 2. Principal moments of inertia
    center_of_mass = np.mean(coordinates, axis=0)
    centered_coords = coordinates - center_of_mass
    
    # Inertia tensor (assuming unit masses)
    I = np.zeros((3, 3))
    for coord in centered_coords:
        x, y, z = coord
        I[0, 0] += y*y + z*z
        I[1, 1] += x*x + z*z
        I[2, 2] += x*x + y*y
        I[0, 1] -= x*y
        I[0, 2] -= x*z
        I[1, 2] -= y*z
    
    I[1, 0] = I[0, 1]
    I[2, 0] = I[0, 2]
    I[2, 1] = I[1, 2]
    
    eigenvalues = np.linalg.eigvals(I)
    eigenvalues = np.sort(eigenvalues)
    
    features['principal_moments'] = eigenvalues
    features['asphericity'] = eigenvalues[2] - 0.5*(eigenvalues[0] + eigenvalues[1])
    features['acylindricity'] = eigenvalues[1] - eigenvalues[0]
    
    # 3. Molecular diameter
    pairwise_distances = []
    for i in range(n_atoms):
        for j in range(i+1, n_atoms):
            dist = np.linalg.norm(coordinates[i] - coordinates[j])
            pairwise_distances.append(dist)
    
    features['diameter'] = max(pairwise_distances)
    features['radius'] = max(pairwise_distances) / 2
    
    # 4. Compactness measures
    features['radius_of_gyration'] = np.sqrt(np.mean(np.sum(centered_coords**2, axis=1)))
    
    return features

def create_advanced_3d_graph(mol, cutoff_distance=5.0):
    """
    Create 3D molecular graph with advanced geometric features.
    """
    # Basic 3D graph
    node_feat, edge_idx, edge_feat, coords = create_3d_molecular_graph(
        mol, cutoff_distance=cutoff_distance
    )
    
    # Add geometric features
    geom_features = calculate_geometric_features(mol, coords)
    
    # Add global features to each node (graph-level features)
    n_atoms = len(coords)
    global_features = np.array([
        geom_features['diameter'],
        geom_features['radius_of_gyration'],
        geom_features['asphericity'],
        geom_features['acylindricity']
    ])
    
    # Broadcast global features to all nodes
    global_feat_matrix = np.tile(global_features, (n_atoms, 1))
    
    # Combine with existing node features
    enhanced_node_features = np.concatenate([node_feat, global_feat_matrix], axis=1)
    
    return enhanced_node_features, edge_idx, edge_feat, coords, geom_features

# Test advanced 3D features
print("Advanced 3D Geometric Features:")
print("=" * 35)

for name, mol in list(conformers.items())[:3]:  # Test first 3 molecules
    print(f"\n{name}:")
    enhanced_feat, edge_idx, edge_feat, coords, geom_feat = create_advanced_3d_graph(mol)
    
    print(f"  Enhanced node features shape: {enhanced_feat.shape}")
    print(f"  Molecular diameter: {geom_feat['diameter']:.2f} Å")
    print(f"  Radius of gyration: {geom_feat['radius_of_gyration']:.2f} Å")
    print(f"  Asphericity: {geom_feat['asphericity']:.2f}")
    print(f"  Acylindricity: {geom_feat['acylindricity']:.2f}")
    if 'convex_hull_volume' in geom_feat and geom_feat['convex_hull_volume'] > 0:
        print(f"  Convex hull volume: {geom_feat['convex_hull_volume']:.2f} ų")

### ✅ Checkpoint: Understanding 3D Molecular Graphs

To reinforce your understanding, try answering these questions:

1. **Question**: What is the main difference between 2D and 3D molecular graphs?
   - **Answer**: 2D graphs only consider chemical connectivity (bonds), while 3D graphs incorporate spatial positions and can include distance-based edges representing non-covalent interactions.

2. **Question**: Why might a 3D molecular graph have more edges than a 2D graph?
   - **Answer**: 3D graphs can include non-covalent interactions (hydrogen bonds, van der Waals forces, etc.) as edges based on spatial proximity, in addition to covalent bonds.

3. **Question**: What role does the cutoff distance play in 3D graph construction?
   - **Answer**: The cutoff distance determines which atom pairs are connected by distance-based edges. Larger cutoffs create denser graphs but may include irrelevant long-range interactions.

4. **Question**: How does conformational flexibility affect 3D molecular graphs?
   - **Answer**: Different conformers of the same molecule will have different 3D coordinates and potentially different distance-based edges, leading to different graph representations for the same chemical structure.

### ✅ Checkpoint Exercise: Build Your Own 3D Molecular Graph

Try these exercises to practice what you've learned:

1. **Basic Exercise**: Choose a molecule with rotatable bonds (e.g., "CCCCCO" - pentanol) and:
   - Generate multiple conformers
   - Create 3D graphs for each conformer
   - Compare the edge counts and average distances

2. **Intermediate Exercise**: For caffeine ("CN1C=NC2=C1C(=O)N(C)C(=O)N2C"):
   - Create 3D graphs with cutoffs of 3, 4, and 5 Å
   - Identify which interactions (beyond covalent bonds) are captured at each cutoff
   - Calculate and compare geometric descriptors

3. **Advanced Exercise**: Design a function that automatically selects an optimal cutoff distance based on molecular size (e.g., using radius of gyration as a guide).

### Converting to PyTorch Geometric Format

Let's convert our 3D molecular graphs to PyG format for use with graph neural networks:

In [None]:
def smiles_to_3d_pyg(smiles: str, cutoff_distance=4.0, optimize=True):
    """
    Convert a SMILES string to a 3D PyTorch Geometric Data object.
    
    Args:
        smiles (str): SMILES string of the molecule
        cutoff_distance (float): Distance cutoff for creating edges
        optimize (bool): Whether to optimize the 3D geometry
    
    Returns:
        torch_geometric.data.Data: PyG Data object with 3D features
    """
    # Generate 3D conformer
    mol = generate_3d_conformer(smiles, optimize=optimize)
    
    # Create 3D graph
    enhanced_feat, edge_idx, edge_feat, coords, geom_feat = create_advanced_3d_graph(
        mol, cutoff_distance=cutoff_distance
    )
    
    # Convert to PyTorch tensors
    x = torch.tensor(enhanced_feat, dtype=torch.float)
    edge_index = torch.tensor(edge_idx, dtype=torch.long).t().contiguous()
    edge_attr = torch.tensor(edge_feat, dtype=torch.float)
    pos = torch.tensor(coords, dtype=torch.float)  # 3D coordinates
    
    # Create PyG Data object
    data = Data(
        x=x,
        edge_index=edge_index,
        edge_attr=edge_attr,
        pos=pos,  # 3D coordinates as 'pos' attribute
        smiles=smiles
    )
    
    # Add molecular-level features as graph attributes
    data.molecular_diameter = torch.tensor([geom_feat['diameter']], dtype=torch.float)
    data.radius_of_gyration = torch.tensor([geom_feat['radius_of_gyration']], dtype=torch.float)
    data.asphericity = torch.tensor([geom_feat['asphericity']], dtype=torch.float)
    
    return data

# Test 3D PyG conversion
print("Converting molecules to 3D PyG format:")
print("=" * 40)

test_molecules_3d = ["CO", "CCO", "c1ccccc1"]  # methanol, ethanol, benzene

for smiles in test_molecules_3d:
    try:
        data_3d = smiles_to_3d_pyg(smiles, cutoff_distance=4.0)
        
        print(f"SMILES: {smiles}")
        print(f"  Node features: {data_3d.x.shape}")
        print(f"  Edge index: {data_3d.edge_index.shape}")
        print(f"  Edge features: {data_3d.edge_attr.shape}")
        print(f"  3D coordinates: {data_3d.pos.shape}")
        print(f"  Molecular diameter: {data_3d.molecular_diameter.item():.2f} Å")
        print(f"  Radius of gyration: {data_3d.radius_of_gyration.item():.2f} Å")
        print()
        
    except Exception as e:
        print(f"SMILES: {smiles} - Error: {e}")
        print()

## 10. Conclusion <a name="conclusion"></a>

This tutorial introduced you to 3D molecular representation for graph neural networks. Here are the key takeaways:

### Key Concepts Learned

1. **3D Conformers**: Molecules exist in 3D space with specific geometric arrangements that profoundly affect their properties and behavior.

2. **Conformational Flexibility**: Many molecules can adopt multiple 3D shapes, each with different energies and properties.

3. **Distance-Based Graphs**: 3D molecular graphs can include both covalent bonds and non-covalent interactions based on spatial proximity.

4. **Geometric Features**: 3D representations enable calculation of sophisticated molecular descriptors like asphericity, molecular volume, and surface area.

5. **Visualization**: 3D visualization helps understand the spatial relationships and interactions within molecules.

### Advantages of 3D Representations

- **Captures stereochemistry** and conformational effects
- **Includes non-covalent interactions** (H-bonds, π-π stacking, etc.)
- **Enables geometric descriptors** that correlate with molecular properties
- **Better represents drug-target interactions** and binding

### Challenges and Considerations

- **Conformational sampling**: Which conformer(s) to use?
- **Computational cost**: 3D coordinates and distance calculations are expensive
- **Cutoff distance selection**: Balance between capturing interactions and avoiding noise
- **Dynamic nature**: Molecules are flexible, but we represent static snapshots

### When to Use 3D vs 2D

**Use 3D representations when:**
- Stereochemistry matters (drug design, catalysis)
- Non-covalent interactions are important
- Predicting properties related to molecular shape/size
- Working with conformationally flexible molecules

**Use 2D representations when:**
- Computational efficiency is critical
- Chemical connectivity is the primary concern
- Working with large datasets where 3D generation is impractical
- Focusing on chemical reaction prediction

### Next Steps

Now that you understand both 2D and 3D molecular representations, you're ready to:
1. Build graph neural networks that can handle both representation types
2. Explore how different representations affect model performance
3. Learn about advanced GNN architectures designed for 3D molecular data
4. Investigate ensemble methods that combine multiple conformers

### Additional Resources

1. **"Geometric Deep Learning"** by Bronstein et al. - Comprehensive overview of GNNs for 3D data
2. **"DimeNet: Directional Message Passing for Molecular Graphs"** - Advanced 3D molecular GNN architecture
3. **RDKit Documentation** - Comprehensive guide to conformer generation and 3D molecular descriptors
4. **PyTorch Geometric Tutorials** - Implementation examples for 3D molecular GNNs

### ✅ Final Challenge: Comprehensive 3D Analysis

Put your knowledge to the test with this comprehensive exercise:

**Challenge**: Choose a drug molecule (e.g., aspirin, ibuprofen, or caffeine) and perform a complete 3D analysis:

1. **Multi-conformer analysis**: Generate 10-20 conformers and analyze their energy distribution
2. **Graph comparison**: Compare 2D vs 3D graph representations across different conformers
3. **Feature analysis**: Calculate and compare geometric descriptors across conformers
4. **Visualization**: Create compelling 3D visualizations showing conformational diversity
5. **PyG conversion**: Convert all conformers to PyG format for ML applications

This exercise will reinforce all concepts from this tutorial and prepare you for real-world applications of 3D molecular GNNs.

**Bonus**: Investigate how conformational flexibility might affect molecular property prediction by comparing the variance in calculated descriptors across conformers.