# RPFR-GUI Quick Start

This notebook demonstrates the basic usage of the RPFR-GUI library.

## Setup

First, let's import the necessary modules and set up paths.

In [1]:
import sys
from pathlib import Path

# Add src to path if running from notebooks/
project_root = Path.cwd().parent.parent
sys.path.insert(0, str(project_root / "src"))

from rpfr_gui.data import ChemistryResolver, H5Provider
from rpfr_gui.domain import IsotopeGraph

print(f"Project root: {project_root}")

Project root: /Users/simonandren/Isotope-Informed-CRN


## 1. Generate Index (One-time Setup)

Before using the ChemistryResolver, we need to generate an index from the HDF5 file. This only needs to be done once.

In [2]:
# Define paths
h5_path = Path("/Users/simonandren/QM9_H5/qm9s.h5")  # Adjust to your path
index_path = project_root / "data/processed/index.parquet"

# Create output directory if needed
index_path.parent.mkdir(parents=True, exist_ok=True)

# Build index (skip if already exists)
if not index_path.exists():
    print("Building index... (this may take a few minutes)")
    ChemistryResolver.build_index(
        h5_path=h5_path,
        output_path=index_path,
        smiles_dataset="SMILES",
        limit=1000,  # Remove this to index all molecules
    )
else:
    print(f"Index already exists at {index_path}")

Index already exists at /Users/simonandren/Isotope-Informed-CRN/data/processed/index.parquet


## 2. Resolve Molecule IDs

Use the ChemistryResolver to convert SMILES strings to molecule IDs.

In [3]:
# Initialize resolver
resolver = ChemistryResolver(index_path)

# Resolve single molecule
methane_id = resolver.resolve("C", id_type="smiles")
print(f"Methane (C) → Molecule ID: {methane_id}")

# Batch resolve
molecules = ["C", "CC", "CCO", "CO"]  # Methane, Ethane, Ethanol, Methanol
mol_ids = resolver.batch_resolve(molecules, id_type="smiles")

for smiles, mol_id in mol_ids.items():
    status = "✓" if mol_id else "✗"
    print(f"{status} {smiles:10s} → {mol_id}")

Methane (C) → Molecule ID: 000001
✓ C          → 000001
✓ CC         → 000007
✓ CCO        → 000014
✓ CO         → 000008


## 3. Load RPFR Data

Use the H5Provider to lazily load RPFR data for specific molecules.

In [4]:
# Initialize provider
provider = H5Provider(h5_path)

# Load data for methane
if methane_id:
    rpfr_data = provider.get_rpfr(methane_id, temperature=300.0)
    print("\nMethane RPFR data at 300K:")
    print(rpfr_data)
    
    # Get structure
    smiles = provider.get_structure(methane_id)
    print(f"\nSMILES: {smiles}")


Methane RPFR data at 300K:
   Atom_Index Atom_Symbol  RPFR_300K
0           0           C   1.117726
1           1           H  11.227987
2           2           H  11.228070
3           3           H  11.228245
4           4           H  11.228226

SMILES: C


## 4. Build Isotope Exchange Network

Create a graph representing the isotope exchange system.

In [5]:
# Initialize graph
graph = IsotopeGraph(connectivity="full", mass_law_enabled=False)

# Add methane to the graph
if methane_id and rpfr_data is not None:
    node_ids = graph.add_molecule(methane_id, rpfr_data)
    print(f"Added {len(node_ids)} atoms to the graph")
    print(f"Node IDs: {node_ids}")
    
    # Set connectivity (all H atoms can exchange)
    graph.set_connectivity(mode="full")
    
    # Summary
    summary = graph.summary()
    print("\nGraph Summary:")
    for key, value in summary.items():
        print(f"  {key}: {value}")

Added 5 atoms to the graph
Node IDs: ['000001_0', '000001_1', '000001_2', '000001_3', '000001_4']

Graph Summary:
  num_nodes: 5
  num_edges: 6
  elements: ['C', 'H']
  num_molecules: 1
  num_connected_components: 2
  connectivity_mode: full
  anchor_set: False
  mass_law_enabled: False


## 5. Analyze Relative RPFR

Set an anchor site and calculate relative RPFR values.

In [6]:
if methane_id and rpfr_data is not None:
    # Set carbon as anchor (first atom)
    graph.set_anchor(node_ids[0])
    
    # Get relative RPFR values
    df_relative = graph.get_rpfr_dataframe(relative=True)
    
    print("\nRelative RPFR (normalized to carbon):")
    print(df_relative[["atom_symbol", "relative_rpfr"]])


Relative RPFR (normalized to carbon):
  atom_symbol  relative_rpfr
0           C       1.000000
1           H      10.045380
2           H      10.045455
3           H      10.045611
4           H      10.045594


## 6. Element-Specific Analysis

Extract subgraphs for specific elements.

In [7]:
if methane_id and rpfr_data is not None:
    # Extract hydrogen subgraph
    h_subgraph = graph.get_subgraph_by_element("H")
    
    print(f"\nHydrogen subgraph:")
    print(f"  Nodes: {h_subgraph.number_of_nodes()}")
    print(f"  Edges: {h_subgraph.number_of_edges()}")
    
    # Identify connected components
    components = graph.get_connected_components()
    print(f"\nConnected components: {len(components)}")
    for i, comp in enumerate(components, 1):
        print(f"  Component {i}: {len(comp)} nodes")


Hydrogen subgraph:
  Nodes: 4
  Edges: 6

Connected components: 2
  Component 1: 1 nodes
  Component 2: 4 nodes


## 7. Batch Processing

Process multiple molecules in a single workflow.

In [8]:
# Load multiple molecules
target_molecules = [mol_id for mol_id in mol_ids.values() if mol_id is not None]

# Create a new graph
batch_graph = IsotopeGraph(connectivity="full")

for mol_id in target_molecules[:3]:  # Limit to first 3 for demo
    rpfr_data = provider.get_rpfr(mol_id, temperature=300.0)
    if rpfr_data is not None:
        batch_graph.add_molecule(mol_id, rpfr_data)
        print(f"Added {mol_id}")

# Set connectivity
batch_graph.set_connectivity(mode="full")

# Summary
print("\nBatch Graph Summary:")
print(batch_graph.summary())

Added 000001
Added 000007
Added 000014

Batch Graph Summary:
{'num_nodes': 22, 'num_edges': 130, 'elements': ['C', 'H', 'O'], 'num_molecules': 3, 'num_connected_components': 3, 'connectivity_mode': 'full', 'anchor_set': False, 'mass_law_enabled': False}


## Next Steps

- Explore the data exploration notebook: `01_data_exploration.ipynb`
- Try visualizing molecules with RDKit
- Implement custom connectivity patterns
- Contribute to the UI layer (Phase 2)!